fifth edition 


The x86 PC 


assembly language, 
design, and interfacing 


MUHAMMAD ALI MAZIDI 
JANICE GILLISPIE MAZIDI 
DANNY CAUSEY 


~T 


THE x86 PC 


Assembly Language, Design, and Interfacing 


Fifth Edition 


Muhammad Ali Mazidi 
Janice Gillispie Mazidi 


Danny Causey 


Prentice Hall 
Boston Columbus Indianapolis New York San Francisco Upper Saddle River 
Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto 


Delhi Mexico City Sao Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo 


Editor in Chief: Vernon Anthony Senior Operations Supervisor: Pat Tonneman 


Acquisitions Editor: Wyatt Morris Operations Specialist: Laura Weaver 
Editorial Assistant: Chris Reed Art Director: Candace Rowley 
Director of Marketing: David Gesell Cover Designer: Rachel Hirschi 
Marketing Manager: Kara Clark Cover Art: Shutterstock 

Marketing Assistant: Les Roberts Printer/Binder: Edwards Brothers 
Senior Managing Editor: JoEllen Gohr Cover Printer: Lehigh-Phoenix Color 
Project Manager: Rex Davidson Text Font: Times Roman 


Library of Congress Cataloging in Publication Data 


Mazidi, Muhammad Ali. 

The x86 PC : Assembly language, design, and interfacing / Muhammad Ali Mazidi, Janice 

Gillispie Mazidi, Danny Causey. -- 5th ed. 
p. cm. 

Includes bibliographical references and index. 

1. Intel 80x86 series microprocessors--Programming. 2. Assembler language (Computer program 
language) 3. IBM microcomputers--Programming. I. Mazidi, Janice Gillispie. II. Causey, Danny. 
II. Title. IV. Title: Assembly language, design, and interfacing. 

QA76.8.1292M37 2010 

005.265--dce22 

2009016076 


Copyright © 2010, 2003, 2000, and 1998 Pearson Education, Inc., publishing as Prentice Hall. This 
title was published previously as The 80x86 IBM PC and Compatible Computers (Volumes I and II) 
Assembly Language, Design, and Interfacing. First edition © 1995 by Muhammad Ali Mazidi and 
Janice Gillispie Mazidi. All rights reserved. Manufactured in the United States of America. This pub- 
lication is protected by Copyright, and permission should be obtained from the publisher prior to any 
prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, 
electronic, mechanical, photocopying, recording, or likewise. To obtain permission(s) to use material 
from this work, please submit a written request to Pearson Education, Inc., Permissions Department, 
1 Lake Street, Upper Saddle River, New Jersey, 07458. 


Many of the designations by manufacturers and sellers to distinguish their products are claimed as 
trademarks. Where those designations appear in this book, and the publisher was aware of a trademark 
claim, the designations have been printed in initial caps or all caps. 


Prentice Hall 
is an imprint of 


PEARSON TDS" 6 Gh 4.2 ome oO 
rn www.pearsonhighered.com ISBN-10: 0-13-502648-2 


ISBN-13: 978-0-13-502648-9 


Trademark Information and Acknowledgments 


All the figures, tables, and instructions related to the x86 family of microprocessors 
used in this textbook belong to Intel. Copyright of Intel Corporation. 


MS-DOS and Windows are trademarks of Microsoft Corp. 


PC AT and PS/2 are registered trademarks of IBM Corporation. 


iii 


Regard man as a mine rich in gems of 
inestimable value. Education can, alone, cause It 


to reveal its treasures, and enable mankind to 
benefit therefrom. 


Baha'u'llah 


BRIEF CONTENTS 


CHAPTERS 

0: Introduction to Computing 

Rs The x86 Microprocessor 

Ds Assembly Language Programming 

35 Arithmetic and Logic Instructions and Programs 
4: INT 21H and INT 10H Programming and Macros 
S Keyboard and Mouse Programming 

6: Signed Numbers, Strings, and Tables 

% Modules and Modular Programming 

8: 32-bit Programming for x86 


Bs 8088, 80286 Microprocessors and ISA Bus 

10: Memory and Memory Interfacing 

Ee 8255 I/O Programming 

12: Interfacing to LCD, Motor, ADC, and Sensor 
13: 8253/54 Timer 

14: Interrupts in x86 PC 

15: Direct Memory Access and DMA Channels in x86 PC 
16: Video and Video Adapters 

as Serial Port Programming with Assembly and C# 
18: Keyboard and Printer Interfacing 

19: Hard Disks 

20: The IEEE Floating Point and x87 Math Processors 
Zhe 386 Microprocessor: Real vs. Protected Mode 
pb ie High-Speed Memory Design and Cache 

23: Pentium and RISC Processors 

24: The Evolution of x86: from 32-bit to 64-bit 

ao. System Design Issues and Failure Analysis 

26: ISA, PC104, and PCI Buses 

oT. USB Port Programming 

APPENDICES 

A: Debug Programming 

B: x86 Instructions Description 

G Assembler Directives and Naming Rules 

D: Interrupt Calls and Legacy Software 

E: I/O Address Maps 

F: ASCII Codes 


699 
mE 
739 
753 
773 
777 


CONTENTS 


CHAPTER 0: 
SECTION 0.1: 
SECTION 0.2: 
SECTION 0.3: 


CHAPTER 1: 
SECTION 1.1: 
SECTION 1.2: 
SECTION 1.3: 
SECTION 1.4: 
SECTION 1.5: 
SECTION 1.6: 
SECTION 1.7: 


CHAPTER 2: 
SECTION 2.1: 
SECTION 2.2: 
SECTION 2.3: 
SECTION 2.4: 
SECTION 2.5: 
SECTION 2.6: 
SECTION 2.7: 


CHAPTER 3: 


SECTION 3.1: 
SECTION 3.2: 
SECTION 3.3: 
SECTION 3.4: 
SECTION 3.5: 
SECTION 3.6: 


CHAPTER 4: 
SECTION 4.1: 
SECTION 4.2: 
SECTION 4.3: 


CHAPTER 5: 
SECTION 5.1: 
SECTION 5.2: 


CHAPTER 6: 
SECTION 6.1: 
SECTION 6.2: 


CHAPTER 7: 
SECTION 7.1: 
SECTION 7.2: 
SECTION 7.3: 


Vi 


INTRODUCTION TO COMPUTING 
NUMBERING AND CODING SYSTEMS 
DIGITAL PRIMER 

INSIDE THE COMPUTER 


THE x86 MICROPROCESSOR 

BRIEF HISTORY OF THE x86 FAMILY 

INSIDE THE 8088/86 

INTRODUCTION TO ASSEMBLY PROGRAMMING 
INTRODUCTION TO PROGRAM SEGMENTS 

THE STACK 

FLAG REGISTER 

x86 ADDRESSING MODES 


ASSEMBLY LANGUAGE PROGRAMMING 
DIRECTIVES AND A SAMPLE PROGRAM 
ASSEMBLE, LINK, AND RUN A PROGRAM 
MORE SAMPLE PROGRAMS 

CONTROL TRANSFER INSTRUCTIONS 
DATA TYPES AND DATA DEFINITION 

FULL SEGMENT DEFINITION 
FLOWCHARTS AND PSEUDOCODE 


ARITHMETIC AND LOGIC INSTRUCTIONS AND 
PROGRAMS 

UNSIGNED ADDITION AND SUBTRACTION 
UNSIGNED MULTIPLICATION AND DIVISION 
LOGIC INSTRUCTIONS 

BCD AND ASCII CONVERSION 

ROTATE INSTRUCTIONS 

BITWISE OPERATORS IN THE C LANGUAGE 


INT 21H AND INT 10H PROGRAMMING AND MACROS 
BIOS INT 10H PROGRAMMING 

DOS INTERRUPT 21H 

WHAT IS A MACRO AND HOW IS IT USED? 


KEYBOARD AND MOUSE PROGRAMMING 
INT 16H KEYBOARD PROGRAMMING 
MOUSE PROGRAMMING WITH INT 33H 


SIGNED NUMBERS, STRINGS, AND TABLES 
SIGNED NUMBER ARITHMETIC OPERATIONS 
STRING AND TABLE OPERATIONS 


MODULES AND MODULAR PROGRAMMING 
WRITING AND LINKING MODULES 

SOME VERY USEFUL MODULES 

PASSING PARAMETERS AMONG MODULES 


CHAPTER 8: 32-BIT PROGRAMMING FOR x86 
SECTION 8.1: 32-BIT PROGRAMMING IN x86 


CHAPTER 9: 8088, 80286 MICROPROCESSORS AND ISA BUS 
SECTION 9.1: 8088 MICROPROCESSOR 

SECTION 9.2: 8284 AND 8288 SUPPORTING CHIPS 

SECTION 9.3: 8-BIT SECTION OF ISA BUS 

SECTION 9.4: 80286 MICROPROCESSOR 

SECTION 9.5: 16-BIT ISA BUS 


CHAPTER 10: MEMORY AND MEMORY INTERFACING 
SECTION 10.1: SEMICONDUCTOR MEMORIES 

SECTION 10.2: MEMORY ADDRESS DECODING 

SECTION 10.3: IBM PC MEMORY MAP 

SECTION 10.4: DATA INTEGRITY IN RAM AND ROM 
SECTION 10.5: 16-BIT MEMORY INTERFACING 


CHAPTER 11: 8255 I/O PROGRAMMING 

SECTION 11.1: 8088 INPUT/OUTPUT INSTRUCTIONS 
SECTION 11.2: I/O ADDRESS DECODING AND DESIGN 
SECTION 11.3: I/O ADDRESS MAP OF x86 PCs 

SECTION 11.4: PROGRAMMING AND INTERFACING THE 8255 


CHAPTER 12: INTERFACING TO LCD, MOTOR, ADC, AND SENSOR 
SECTION 12.1: INTERFACING TO AN LCD 

SECTION 12.2: INTERFACING TO A STEPPER MOTOR 

SECTION 12.3: INTERFACING TO A DAC 

SECTION 12.4: INTERFACING TO ADC CHIPS AND SENSORS 


CHAPTER 13: 8253/54 TIMER 

SECTION 13.1: 8253/54 TIMER 

SECTION 13.2: x86 PC 8253/54 TIMER CONNECTION AND PROGRAMMING 
SECTION 13.3: GENERATING MUSIC ON THE x86 PC 


CHAPTER 14: INTERRUPTS IN x86 PC 

SECTION 14.1: 8088/86 INTERRUPTS 

SECTION 14.2: x86 PC AND INTERRUPT ASSIGNMENT 

SECTION 14.3: 8259 PROGRAMMABLE INTERRUPT CONTROLLER 
SECTION 14.4: USE OF THE 8259 CHIP IN x86 PCs 

SECTION 14.5: MORE ON INTERRUPTS IN x86 PCs 


CHAPTER 15: DIRECT MEMORY ACCESS AND DMA CHANNELS IN 
x86 PC 

SECTION 15.1: CONCEPT OF DMA 

SECTION 15.2: 8237 DMA CHIP PROGRAMMING 

SECTION 15.3: 8237 DMA INTERFACING IN THE IBM PC 

SECTION 15.4: DMA IN x86 PCs 


217 
218 


227 
228 
233 
238 
242 
246 


255 
256 
265 
269 
278 
278 


289 
290 
292 
294 
299 


315 
316 
326 
352 
336 


349 
350 
354 
359 


367 
368 
374 
Bei: 
387 
393 


401 
402 
404 
413 
417 


vil 


CHAPTER 16: 
SECTION 16.1: 
SECTION 16.2: 
SECTION 16.3: 


CHAPTER 17: 


SECTION 17.1: 
SECTION 17.2: 


CHAPTER 18: 
SECTION 18.1: 
SECTION 18.2: 
SECTION 18.3; 


CHAPTER 19: 
SECTION 19.1: 


CHAPTER 20: 


SECTION 20.1: 


SECTION 20.2: 
SECTION 20:3: 


CHAPTER 21: 
SECTION 21.1: 
SEGTION 21.2: 
SECHION 21:3; 


CHAPTER 22: 
SECTION 22.1: 
SECMON?22: 
SECTION 22.3: 
SECTION 22.4: 


CHAPTER 23: 
SECTION 23.1: 
SECTION 23.2: 
pECTION 23:3: 
SECTION 23.4: 
SECTION 23.5: 


CHAPTER 24: 


SECTION 24.1: 
SECTION 24.2: 


viii 


VIDEO AND VIDEO ADAPTERS 

PRINCIPLES OF MONITORS AND VIDEO MODES 
TEXT MODE PROGRAMMING AND VIDEO RAM 
GRAPHICS AND GRAPHICS PROGRAMMING 


SERIAL PORT PROGRAMMING WITH ASSEMBLY 
AND C# 

BASICS OF SERIAL COMMUNICATION 
PROGRAMMING x86 PC COM PORTS USING ASSEMBLY 
AND C# 


KEYBOARD AND PRINTER INTERFACING 
INTERFACING THE KEYBOARD TO THE CPU 

PC KEYBOARD INTERFACING AND PROGRAMMING 
PRINTER AND PRINTER INTERFACING IN THE IBM PC 


HARD DISKS 
HARD DISK ORGANIZATION AND PERFORMANCE 


THE IEEE FLOATING POINT AND x87 MATH 
PROCESSORS 

MATH COPROCESSOR AND IEEE FLOATING-POINT 
STANDARDS 

x87 INSTRUCTIONS AND PROGRAMMING 

x87 INSTRUCTIONS. 


386 MICROPROCESSOR: REAL vs. PROTECTED MODE 
80386 IN REAL MODE 

80386: A HARDWARE VIEW 

80386 PROTECTED MODE 


HIGH-SPEED MEMORY DESIGN AND CACHE 
MEMORY CYCLE TIME OF THE x86 

PAGE AND STATIC COLUMN DRAMS 

CACHE MEMORY 

SDRAM, DDR RAM, AND RAMBUS MEMORIES 


PENTIUM AND RISC PROCESSORS 
THE 80486 MICROPROCESSOR 
INTEL'S PENTIUM 

RISC ARCHITECTURE 

PENTIUM PRO PROCESSOR 

MMX TECHNOLOGY 


THE EVOLUTION OF x86: FROM 32-BIT TO 64-BIT 
x86 PENTIUM EVOLUTION 
64-BIT PROCESSORS AND VISTA FOR x86 


423 
424 
433 
440 


447 
448 


455 


463 
464 
468 
478 . 


491 
492 


503 


504 
508 
S19 


529 
530 
II 
545 


559 
560 
562 
570 
578 


589 
590 
596 
602 
609 
613 


625 
626 
632 


CHAPTER 25: SYSTEM DESIGN ISSUES AND FAILURE ANALYSIS 
SECTION 25.1: OVERVIEW OF IC TECHNOLOGY 
SECTION 25.2: IC INTERFACING AND SYSTEM DESIGN ISSUES 


CHAPTER 26: ISA, PC104, AND PCI BUSES 
SECTION 26.1: ISA BUS MEMORY SIGNALS 
SECTION 26.2: I/O BUS TIMING IN ISA BUS 
SECTION 26.3: PCI BUS 


CHAPTER 27: USB PORT PROGRAMMING 

SECTION 27.1: USB PORTS: AN OVERVIEW 

SECTION 27.2: USB PORT EXPANSION AND POWER MANAGEMENT 
SECTION 27.3: USB PORT PROGRAMMING 


APPENDIX A: DEBUG PROGRAMMING 

SECTION A.1: ENTERING AND EXITING DEBUG 

SECTION A.2: EXAMINING AND ALTERING REGISTERS 

SECTION A.3: CODING AND RUNNING PROGRAMS IN DEBUG 
SECTION A.4: DATA MANIPULATION IN DEBUG 

SECTION A.5: EXAMINING/ALTERING THE FLAG REGISTER IN DEBUG 


APPENDIX B: x86 INSTRUCTIONS DESCRIPTION 
SECTION B.1: THE 8086 INSTRUCTION SET 


APPENDIX C: ASSEMBLER DIRECTIVES AND NAMING RULES 
SECTION C.1: x86 ASSEMBLER DIRECTIVES 
SECTION C.2: RULES FOR LABELS AND RESERVED NAMES 


APPENDIX D: INTERRUPT CALLS AND LEGACY SOFTWARE 
SECTION D.1: 21H INTERRUPTS 

SECTION D.2: MOUSE INTERRUPTS 33H 

SECTION D.3: INT 10H 

SECTION D.4: INT 12H 

SECTION D.5: INT 14H 

SECTION D.6: INT 16H -- KEYBOARD 

SECTION D.7: INT 1AH 


APPENDIX E: I/O ADDRESS MAP 

SECTION E.1: ORIGINAL 80286 IBM PC I/O ADDRESS MAP 
SECTION E.2: Dell x86 PC I/O ADDRESS MAP 

APPENDIX F: ASCII CODES 


INDEX 


637 
638 
644 


659 
660 
668 
676 


687 
688 
689 
694 


699 
700 
700 
702 
706 
710 


75 
716 


739 
740 
750 


753 
754 
135 
759) 
765 
765 
767 
770 


773 
WIZ 
774 
VYY 


779 


1X 


DEDICATIONS 


This book is dedicated to the memory of Muhammad Ali's parents, who 
raised 10 children and persevered through more than 50 years of hardship 
together with dignity and faith. 


We feel especially blessed to have the support, love, and encouragement 
of Janice's parents, whose kindness, wisdom, and sense of humor have 
been the bond that has welded us into a family. 


In addition, we must also mention our two most important collaborations: our 


sons Robert Nabil and Michael Jamal, who have taught us the meaning 
of love and patience. 


We would also like to honor the memory of a dear friend, Kamran Lotfi. ` 


— Muhammad Ali Mazidi 
and Janice Mazidi 


PREFACE 


Purpose 


This book is intended for use in college-level courses in which both Assembly 
language programming and x86 PC interfacing are discussed. It not only builds the foun- 
dation of Assembly language programming, but also provides a comprehensive treatment 
of x86 PC design and interfacing for students in engineering and computer science disci- 
plines. This volume is intended for those who wish to gain an in-depth understanding of 
the internal working of the x86 PC. It builds a foundation for the design and interfacing 
of microprocessor-based systems using the real-world example of the x86 PC. In addition, 
it can also be used by practicing technicians, hardware engineers, computer scientists, and 
hobbyists who want to do PC interfacing and data acquisition. 


Prerequisites 


Readers should have taken an introductory digital course. Knowledge of other 
programming languages would be helpful, but is not necessary. 

Although a vast majority of current PCs use x86 such as Pentium microprocessors, 
their design is based on the IBM PC/AT, an 80286 microprocessor system introduced in 
1984. A good portion of the features of the PC/AT, hence its limitations, are based on the 
original IBM PC, an 8088 microprocessor system, introduced in 1981. In other words, one 
cannot expect to understand fully the architectural philosophy of the x86 PC and its inter- 
nal architecture unless the 80286 PC/AT and its subset, the IBM PC/XT, are first under- 
stood. For this reason, we describe the 8088 and 80286 microprocessors in Chapter 9. 


Contents 


A systematic, step-by-step approach has been used in covering various aspects of 
Assembly language programming. Many examples and sample programs are given to 
clarify concepts and provide students an opportunity to learn by doing. Review questions 
are provided at the end of each section to reinforce the main points of the section. We feel 
that one of the functions of a textbook is to familiarize the student with terminology used 
in technical literature and in industry, so we have followed that guideline in this text. 

Chapter 0 covers concepts in number systems (binary, decimal, and hex) and com- 
puter architecture. Most students will have learned these concepts in previous courses, but 
Chapter 0 provides a quick overview for those students who have not learned these con- 
cepts, or who may need to refresh their memory. 

Chapter | provides a brief history of the evolution of x86 microprocessors and an 
overview of the internal workings of the 8086 as a basis of all x86 processors. Chapter 1 
should be used in conjunction with Appendix A (a tutorial introduction to DEBUG) so that 
the student can experiment with concepts being learned on the x86 PC. The order of top- 
ics in Appendix A has been designed to correspond to the order of topics presented in 
Chapter 1. Thus, the student can begin programming with DEBUG without having to 
learn how to use an assembler. 

Chapter 2 explains the use of assemblers to create programs. Although the pro- 
grams in the book were developed and tested with Microsoft's MASM assembler, any 
Intel-compatible assembler such as Borland's TASM may be used. 

Chapter 3 introduces the bulk of the logic and arithmetic instructions for unsigned 
numbers, plus bitwise operations in C. 

Chapter 4 introduces DOS and BIOS interrupts. Programs in Assembly allow the 
student to get input from the keyboard and send output to the monitor. In addition, macro 
programming in assembly is described. 

Chapter 5 describes how to program the keyboard and mouse. 

Chapter 6 covers arithmetic and logic instructions for signed numbers as well as 
string processing instructions. 

EEE o O U 
xi 


Chapter 7 discusses modular programming and how to develop larger Assembly 
language programs by breaking them into smaller modules to be coded and tested sepa- 
rately. In addition, in-line Assembly language within C programs is explained. 

Chapter 8 introduces some 32-bit concepts of x86 programming. Although this 
book emphasizes 16-bit programming, the 386 is introduced to help the student appreci- 
ate the power of 32-bit CPUs. l 

Chapter 9 describes the 8088 and 286 microprocessors and supporting chips in 
detail and shows how they are used in the original IBM PC/XT/AT. In addition, the origin 
and function of the address, data, and control signals of the ISA expansion slot are 
described. 

Chapter 10 provides an introduction to various types of RAM and ROM memo- 
ries, their interfacing to the microprocessor, the memory map of the x86 PC, the timing 
issue in interfacing memory, and the checksum byte and parity bit techniques of ensuring 
data integrity in RAM and ROM. 

Chapter 11 is dedicated to the interfacing of I/O ports, the use of IN and OUT 
instructions in the x86, and interfacing and programming of the 8255 programmable 
peripheral chip. We describe I/O programming in several languages, as well. 

Chapter 12 covers the interfacing of PCs to devices for data acquisition such as 
LCDs, stepper motors, ADC, DAC, and sensors. 

Chapter 13 discusses the use of the 8253/54 timer chip in the x86 PC, as well as 
how to generate music and time delays. 

Chapter 14 is dedicated to the explanation of hardware and software interrupts, 
the use of the 8259 interrupt controller, the origin and assignment of IRQ signals on the 
expansion slots of the ISA bus, and exception interrupts in 80x86 microprocessors. 

Chapter 15 is dedicated to direct memory access (DMA) concepts, the use of the 
8237 DMA chip in the x86 PC, and DMA channels and associated signals on the ISA bus. 

Chapter 16 covers the basics of video monitors and various video modes and 
adapters of the PC, in addition to the memory requirements of various video boards in 
graphics mode. 

Chapter 17 discusses serial communication principles and programming of the PC 
COM port in Assembly and C. 

Chapter 18 covers the interfacing and programming of the keyboard in the x86 
PC, in addition to printer port interfacing and programming. In addition, a discussion of 
various types of parallel ports such as EPP and ECP is included. 

Chapter 19 discusses hard disk storage organization and terminology. 

Chapter 20 examines the x87 math coprocessor, its programming, and IEEE sin- 
gle- and double-precision floating point data types. 

Chapter 21 explores the programming and hardware of the 386 microprocessor, 
contrasts and explains real and protected modes, and discusses the implementation of vir- 
tual memory. 

Chapter 22 is dedicated to the interfacing of high-speed memories and describes 
various types of DRAM, including EDO and SDRAM, and examines cache memory and 
various cache organizations and terminology in detail. 

In Chapter 23 we describe the main features of the 486, Pentium, and Pentium Pro 
and compare these microprocessors with the RISC processors. Chapter 23 also provides a 
discussion of MMX technology and how to write programs to detect which CPU a PC has. 

Chapter 24 examines the new generation of 64-bit microprocessors from Intel and 
AMD. 

Chapter 25 provides an overview of the IC technology and failure analysis, 
describes IC interfacing and system design issues, and covers error detection and correc- 
tion. 

Chapter 26 is dedicated to the discussion of the various types of PC buses, such 
as ISA and PC104, their performance comparisons, and features of the PCI bus. 

Chapter 27 describes the USB port in detail and shows how to use the C pro- 
gramming language to access USB devices connected to the USB port of x86 PCs. 


a 
xii 


Appendices 


The appendices have been designed to provide all reference material required 
for the topics covered in this combined volume so that no additional references should be 
necessary. 

Appendix A provides a tutorial introduction to DEBUG Appendix B provides a 
listing of Intel's 8086 instructions. Appendix C describes assembler directives with exam- 
ples of their use. Appendix D lists commonly used interrupt fuction calls and legacy soft- 


ware. Appendix E lists the I/O maps of x86 PCs. Appendix F provides a table of ASCII 
codes. 


Lab Manual 


The lab manual contains some very basic labs and can be found at the 
www.MicroDigitalEd.com website. The more advanced and rigorous lab assignments are 
left up to the instructor depending on the course objectives, class level, and whether the 
course is graduate or undergraduate. The support materials for this and other books by the 
authors can be found on this website, too. 


Solutions Manual/PowerPoint® Slides 


The end-of-chapter problems cover some very basic concepts. The more chal- 
lenging and rigorous homework assignments are left up to the instructor depending on the 
course objectives, class level, and whether the course is graduate or undergraduate. The 
solutions manual was produced with the help of Mr. Sepehr Naimi. The solutions manual 
and PowerPoint® slides for the drawings are available online for instructors only. 


Online Instructor Resources 


To access supplementary materials online, instructors need to request an instruc- 
tor access code. Go to www.prenhall.com, click the Instructor Resource Center link, 
and then click Register Today for an instructor access code. Within 48 hours after regis- 
tering you will receive a confirming e-mail including an instructor access code. Once you 
have received your code, go to the site and log on for full instructions on downloading the 
materials you wish to use. 


Acknowledgments 


This book is the result of the dedication, work, and love of many individuals. 
Thanks must go to the many students and professors around the world whose comments 
have helped to shape this book since its original publication in 1992. Special thanks to 
Dimitri Moonen for his detailed reading of the book and pointing out errors. Thanks to 
Pedran Mazidi for redoing the tables and figures for this new edition. 

We would like to thank the people at Prentice Hall, in particular our editor Wyatt 
Morris, who continues to support and encourage our writing, and our project manager Rex 
Davidson, who made the book a reality. We were lucky to get the best copy editor in the 
world, Bret Workman. Thank you for your fantastic job, as usual. 

Finally, we would like to sincerely thank Dr. Roger S. Walker of the Computer 
Science Engineering Department, University of Texas at Arlington for his constant 
encouragement. We enjoyed writing this book, and hope you enjoy reading it and using 
it for your courses and projects. Please let us know if you have any suggestions or find 
any errors. 


yiii 


ABOUT THE AUTHORS 


Muhammad Ali Mazidi went to Tabriz University and holds Master’s degrees 
from both Southern Methodist University and the University of Texas at Dallas. He is cur- 
rently a.b.d. on his Ph.D. in the Electrical Engineering Department of Southern Methodist 
University. He is co-author of some widely used textbooks, including The 8051 
Microcontroller and Embedded Systems, The PIC Microcontroller and Embedded 
Systems, and The HCS12 Microcontroller and Embedded Systems, also available from 
Prentice Hall. He teaches microprocessor-based system design at DeVry University in 
Dallas, Texas. He is the founder of MicroDigitalEd.com, which provides support for this 
book. 


Janice Gillispie Mazidi has a Master of Science degree in Computer Science from 
the University of North Texas. She has several years experience as a software engineer in 
Dallas. She is currently teaching computer science and math courses in a local high 
school. 


Danny Causey is a graduate of DeVry University and holds a Bachelor's of 
Science in Computer Engineering Technology. He is co-author of The PIC 
Microcontroller and Embedded Systems and The HCS12 Microcontroller and 
Embedded Systems. He is a partner in MicroDigitalEd.com. He works in the software 
development field, providing tools to accelerate daily business. 


The authors can be contacted at the following email addresses if you have any 
comments or suggestions, or if you find any errors. 


mdebooks@yahoo.com 
mmazidi@microdigitaled.com 
dcausey@microdigitaled.com 


XIV 


CHAPTER 0 


INTRODUCTION TO 
COMPUTING 


OBJECTIVES 
Upon completion of this chapter, you will be able to: 


>> Convert any number from base 2, base 10, or base 16 to either of the 
other two bases 

>> Add and subtract hex numbers 

>> Add binary numbers 

>> Represent any binary number in 2’s complement 

>> Represent an alphanumeric string in ASCH code 

>> Describe logical operations AND, OR, NOT, XOR, NAND, NOR 

>> Use logic gates to diagram simple circuits 

>> Explain the difference between a bit, a nibble, a byte, and a word 

>> Give precise mathematical definitions of the terms kilobyte, megabyte, 
gigabyte, and terabyte 

>> Explain the difference between RAM and ROM and describe their use 

>> Describe the purpose of the major components of a computer system 

>> List the three types of buses found in computers and describe the 
purpose of each type of bus 

>> Describe the role of the CPU in computer systems 

>> List the major components of the CPU and describe the purpose of each 


To understand the software and hardware of a microcontroller-based system, one 
must first master some very basic concepts underlying computer design. In this chapter 
(which in the tradition of digital computers is called Chapter 0), the fundamentals of num- 
bering and coding systems are presented. After an introduction to logic gates, an overview 
of the workings inside the computer is given. Finally, in the last section we give a brief 
history of CPU architecture. Although some readers may have an adequate background in 
many of the topics of this chapter, it is recommended that the material be scanned, how- 
ever briefly. 


SECTION 0.1: NUMBERING AND CODING SYSTEMS 


Whereas human beings use base 10 (decimal) arithmetic, computers use the base 
2 (binary) system. In this section we explain how to convert from the decimal system to 
the binary system, and vice versa. The convenient representation of binary numbers in 
base 16, called hexadecimal, also is covered. Finally, the binary format of the alphanumer- 
ic code, called ASCI, is explored. 


Decimal and binary number systems 


Although there has been speculation that the origin of the base 10 system is the 
fact that human beings have 10 fingers, there is absolutely no speculation about the rea- 
son behind the use of the binary system in computers. The binary system is used in com- 
puters because 1 and 0 represent the two voltage levels of on and off. Whereas in base 10 
there are 10 distinct symbols, 0, 1, 2, ..., 9, in base 2 there are only two, 0 and 1, with 
which to generate numbers. Base 10 contains digits 0 through 9; binary contains digits 0 
and | only. These two binary digits, 0 and 1, are commonly referred to as bits. 


Converting from decimal to binary 


One method of converting from decimal to binary is to divide the decimal num- 
ber by 2 repeatedly, keeping track of the remainders. This process continues until the quo- 
tient becomes zero. The remainders are then written in reverse order to obtain the binary 
number. This is demonstrated in Example 0-1. 


Example 0-1 
Convert 25)9 to binary. 


Solution: 


Quotient Remainder 
25/2 = 2 LSB. (least significant bit) 
1272 6 
6/2 3 
37.2 i 
0 


E MSB (most srogo rcant Dao) 


Therefore, 25;ġ = 110015. 


Converting from binary to decimal 


To convert from binary to decimal, it is impor- 
tant to understand the concept of weight associated 
with each digit position. First, as an analogy, recall the 
weight of numbers in the base 10 system, as shown in 
the diagram. By the same token, each digit position in 
a number in base 2 has a weight associated with it: 


740683 

TOOTS = Decimal 
1x29 = xl = if il 
0x21 = 0x2 = 0 00 
1x22 = 1x4 = 4 100 
0x23 = 0x8 = 0 0000 
1x24 = 1x16 = 16 10000 
1x2° = 1x32 = 32 100000 

53 LOLOL 


Knowing the weight of each bit in a binary number makes it simple to add them 
together to get its decimal equivalent, as shown in Example 0-2. 


Example 0-2 
Convert 11001, to decimal. 


Solution: 
Weight: 
Digits: 
Sum: 


Knowing the weight associated with each binary bit position allows one to con- 
vert a decimal number to binary directly instead of going through the process of repeated 
division. This is shown in Example 0-3. 


Example 0-3 
Use the concept of weight to convert 39) to binary. 
Solution: 


Weight: o2 16 
1 0 


soe M 
Therefore, 3910 = 1001115. 


CHAPTER 0: INTRODUCTION TO COMPUTING 3 


Hexadecimal system 


Base 16, or the hexadecimal system as it is called in com- 
puter literature, is used as a convenient representation of binary 
numbers. For example, it is much easier for a human being to 
represent a string of 0s and 1s such as 100010010110 as its hexa- 
decimal equivalent of 896H. The binary system has 2 digits, 0 
and 1. The base 10 system has 10 digits, 0 through 9. The hexa- 
decimal (base 16) system has 16 digits. In base 16, the first 10 
digits, 0 to 9, are the same as in decimal, and for the remaining 
six digits, the letters A, B, C, D, E, and F are used. Table 0-1 
shows the equivalent binary, decimal, and hexadecimal represen- 
tations for 0 to 15. 


Converting between binary and hex 


To represent a binary number as its equivalent hexadeci- 
mal number, start from the right and group 4 bits at a time, replac- 
ing each 4-bit binary number with its hex equivalent shown in 
Table 0-1. To convert from hex to binary, each hex digit is 
replaced with its 4-bit binary equivalent. See Examples 0-4 and 
0-5. 


Example 0-4 
Represent binary 100111110101 in hex. 


Solution: 

First the number is grouped into sets of 4 bits: 1001 1111 0101. 

Then each group of 4 bits is replaced with its hex equivalent: 
1001 1111 0101 


G E 2 
Therefore, 1001111101015 = 9F5 hexadecimal. 


Example 0-5 
Convert hex 29B to binary. 
Solution: 


2 9 B 


, = 0010 1001 1011 
Dropping the leading zeros gives 1010011011. 


Converting from decimal to hex 


Table 0-1: Base 16 
Number Systems __ 


0 0000 0 
1 0001 1 
2 0010 2 
3 0011 3 
4 0100 4 
5 0101 5 
6 0110 6 
7 0111 7 
8 1000 8 
9 1001 9 
10 1010 A 
i 1011 B 
12 1100 € 
13 1101 D 
14 1110 E 
15 11 F 


Converting from decimal to hex could be approached in two ways: 
1. Convert to binary first and then convert to hex. Example 0-6 shows this 


method of converting decimal to hex. 


2. Convert directly from decimal to hex by repeated division, keeping track of the 
remainders. Experimenting with this method is left to the reader. 


eee nnn ere rere SS 


Example 0-6 
(a) Convert 4519 to hex. 


First, convert to binary. 
32+8+4+1=45 


32 8 4 
1 1 


4510 = 0010 1101, = 2D hex 
(b) Convert 629; 9 to hex. 


12 


8 4 
1 0 l 


2 
0 
629;9= (512 +64+32+16+4+ 1)= 0010 0111 01015 =275 hex 


(c) Convert 1714), to hex. 


1714\o> (1024 + 512 + 128 + 32 + 16+ 2) = 0110 1011 00105 = 6B2 hex 


Converting from hex to decimal 


Conversion from hex to decimal can also be approached in two ways: 


Convert from hex to binary and then to decimal. Example 0-7 demonstrates this 
method of converting from hex to decimal. 


2. Convert directly from hex to decimal by summing the weight of all digits. 


Example 0-7 


Convert the following hexadecimal numbers to decimal. 


(a) 6B216 = 0110 1011 00105 
Wea Sia 256" 12 64 J 


1 | 0 1 0 


Nea ie 128 32 eer 2 1 lay 
(b) 9F2D)¢ = 1001 1111 0010 11015 


1 0 0 l 1 Íl 


32768 + 4096 + 2048 + 1024 + 512 + 256 + 32 + 8 + 4 + 1 = 40,7490 


CHAPTER 0: INTRODUCTION TO COMPUTING 5 


Table 0-2: Counting in Bases. Counting in bases 10, 2, and 16 


Decimal Binary Hex To show the relationship between all three 
0 00000 0 bases, in Table 0-2 we show the sequence of num- 
1 00001 1 bers from 0 to 31 in decimal, along with the equiv- 
2 00010 _ 2 alent binay Table 0-3: Binary Addition 
3 00011 3 a mae 
4 00100 4 numbers. A+B Carry Sum 
5 00101 5 m 5 oae 0 
6 00110 6 n Orl 0 | 
at when 

1 00111 i, one more is Li L l 
8 01000 8 added to the ial l 0 
10 01010 A digit, that 
11 01011 B digit becomes zero and a | is carried to the next- 
1 01100 Œ highest digit position. For example, in decimal, 9 + 
13 01101 D 1 = 0 with a carry to the next-highest position. In 
14 01110 E binary, 1 + 1 = 0 with a carry; similarly, in hex, F + 
ie 01111 F 1 = 0 with a carry. 
16 10000 10 Addition of binary and hex numbers 
I7 10001 11 
18 10010 12 The addition of binary numbers is a very 
19 10011 13 straightforward process. Table 0-3 shows the addi- 
0 10100 14. tion of two bits. The discussion of subtraction of 

———— binary numbers is bypassed since all computers use 
2l 10101 15 the addition process to implement subtraction. 
22 10110 16 Although computers have adder circuitry, there is 
23 10111 17 no separate circuitry for subtractors. Instead, 
24 11000 18 adders are used in conjunction with 2 s complement 
25 11001 19 circuitry to perform subtraction. In other words, to 
26 11010 1A implement “x — y”, the computer takes the 2’s com- 
oF Lou 1B plement of y and adds it to x. The concept of 2’s 
28 11100 1C complement is reviewed next. Example 0-8 shows 
29 Thi 1D the addition of binary numbers. 
30 11110 1E 
31 11111 1F 


Example 0-8 


Add the following binary numbers. Check against their decimal equivalents. 


Solution: 


Decimal 
13 
2 
22; 


2’s complement 


To get the 2’s complement of a binary number, invert all the bits and then add | 
to the result. Inverting the bits is simply a matter of changing all Os to 1s and 1s to Os. 
This is called the / $ complement. See Example 0-9. 


Example 0-9 


Take the 2’s complement of 10011101. 


Solution: 
10011101 binary number 
01100010 1’s complement 
— 
01100011 2’s complement 


Addition and subtraction of hex numbers 


In studying issues related to software and hardware of computers, it is often nec- 
essary to add or subtract hex numbers. Mastery of these techniques is essential. Hex addi- 
tion and subtraction are discussed separately below. 


Addition of hex numbers 


This section describes the process of adding hex numbers. Starting with the least 
significant digits, the digits are added together. If the result is less than 16, write that digit 
as the sum for that position. If it is greater than 16, subtract 16 from it to get the digit and 
carry | to the next digit. The best way to explain this is by example, as shown in Example 
0-10. 


Example 0-10 
Perform hex addition: 23D9 + 94BE. 


Solution: 
23D9 : 9+14=23 23 — 16 = 7 with a carry 
94BE 1413+ (P=25 2) —16= 9 wiliva catry 
B897 1+3+4=8 
2+9=B 


Subtraction of hex numbers 


In subtracting two hex numbers, if the second digit is greater than the first, bor- 
row 16 from the preceding digit. See Example 0-11. 


Example 0-11 
Perform hex subtraction: 59F — 2B8. 


Solution: 


LSD: 8 from 15=7 
11 from 25 (9 + 16) = 14 (E) 
2 from 4 (5-1) =2 


CHAPTER 0: INTRODUCTION TO COMPUTING 7 


ASCII code 


The discussion so far has revolved around the representation of number systems. 
Since all information in the computer must be represented by Os and 1s, binary patterns 
must be assigned to letters and other characters. In the 1960s a standard representation 
called ASCII (American Standard Code for Information Interchange) was established. 
The ASCII (pronounced “ask-E”) code assigns binary patterns for numbers 0 to 9, all the 
letters of the English alphabet, both uppercase (capital) and lowercase, and many control 
codes and punctuation marks. The great advantage of this system is that it is used by most 
computers, so that information can be shared among computers. The ASCII system uses 
a total of 7 bits to represent each code. For example, 100 0001 is assigned to the upper- 
case letter “A” and 110 0001 is for the lowercase “a”. Often, a zero is placed in the most 
significant bit position to make it an 8-bit code. Figure 0-1 shows ASCII codes. A com- 
plete list of extended ASCH codes is given in Appendix F. The use of ASCII is not only 
standard for keyboards used in the United States and many other countries but also pro- , 
vides a standard for printing and displaying characters by output devices such as printers 
and monitors. 

Notice that the pattern of ASCII codes was designed to allow for easy manipula- 
tion of ASCII data. For example, digits 0 through 9 are represented by ASCII codes 30 
through 39. This enables a program to easily convert ASCII to decimal by masking off the 
“3” in the upper nibble. Also notice that there is a relationship between the uppercase and 
lowercase letters. The uppercase letters are represented by ASCII codes 41 through 5A 
while lowercase letters are represented by codes 61 through 7A. Looking at the binary 
code, the only bit that is different between the uppercase “A” and lowercase “a” is bit 5. 
Therefore, conversion between uppercase and lowercase is as simple as changing bit 5 of 
the ASCII code. 


+kEwW A = PNAS 
ws ll Awe ee 00 GD Ad GS OA wd ND Pt |S 
OZER Ge tO mmo De 
DavVerANEX ECE tH As 


© 
a 
id E 
le 
910 
‘18 
16 
g 
F 
ih 
x% 


@artrrbtewe bins 
OS Jera ee oO oD 


p 


Figure 0-1. Selected ASCII Codes 


Review Questions 


1. Why do computers use the binary number system instead of the decimal system? 

2. Convert 34,, to binary and hex. 

3. Convert 110101, to hex and decimal. 
SSS 


4. Perform binary addition: 101100 + 101. 


5. Convert 101100, to its 2’s complement representation. 


SECTION 0.2: DIGITAL PRIMER 


This section gives an overview of digital logic 
and design. First, we cover binary logic operations, 
then we show gates that perform these functions. Next, 
logic gates are put together to form simple digital cir- 
cuits. Finally, we cover some logic devices commonly 
found in microcontroller interfacing. 


Binary logic 


As mentioned earlier, computers use the bina- 
ry number system because the two voltage levels can 
be represented as the two digits 0 and 1. Signals in dig- 
ital electronics have two distinct voltage levels. For 
example, a system may define 0 V as logic 0 and +5 V 
as logic 1. Figure 0-2 shows this system with the built- 
in tolerances for variations in the voltage. A valid dig- 
ital signal in this example should be within either of the 
two shaded areas. 


Logic gates 


Binary logic gates are simple circuits that take 
one or more input signals and send out one output sig- 
nal. Several of these gates are defined below. 


AND gate 


The AND gate takes two or more inputs and 
performs a logic AND on them. See the truth table and 
diagram of the AND gate. Notice that if both inputs to 
the AND gate are 1, the output will be 1. Any other 
combination of inputs will give a 0 output. The exam- 
ple shows two inputs, x and y. Multiple outputs are also 
possible for logic gates. In the case of AND, if all 
inputs are 1, the output is 1. If any input is 0, the out- 
put is zero. 


OR gate 


The OR logic function will output a 1 if one or 
more inputs is 1. If all inputs are 0, then and only then 
will the output be 0. 


Tri-state buffer 


A buffer gate does not change the logic level of 
the input. It is used to isolate or amplify the signal. 


CHAPTER 0: INTRODUCTION TO COMPUTING 


Figure 0-2. Binary Signals 


Logical AND Function 
Inputs Output 


X_Y XAND Y 
0 0 0 
o 0 
C 0 
Il sal l 


yl) XANDY 


< 


a a 
Control 


Inverter 


The inverter, also called NOT, outputs the 
value opposite to that input to the gate. That is, a 1 
input will give a 0 output, while a 0 input will give a 
1 output. 


XOR gate 


The XOR gate performs an exclusive-OR 
operation on the inputs. Exclusive-OR produces a 1 
output if one (but only one) input is 1. If both 
operands are 0, the output is zero. Likewise, if both 
operands are 1, the output is also zero. Notice from 
the XOR truth table, that whenever the two inputs 
are the same, the output is zero. This function can be 
used to compare two bits to see if they are the same. 


NAND and NOR gates 


The NAND gate functions like an AND gate 
with an inverter on the output. It produces a zero out- 
put when all inputs are 1; otherwise, it produces a 1 
output. The NOR gate functions like an OR gate with 
an inverter on the output. It produces a | if all inputs 
are 0; otherwise, it produces a 0. NAND and NOR 
gates are used extensively in digital design because 
they are easy and inexpensive to fabricate. Any cir- 
cuit that can be designed with AND, OR, XOR, and 
INVERTER gates can be implemented using only 
NAND and NOR gates. A simple example of this is 
given below. Notice in NAND, that if any input is 
zero, the output is one. Notice in NOR, that if any 
input is one, the output is zero. 


Logic design using gates 


Next we will show a simple logic design to 
add two binary digits. If we add two binary digits 


there are four possible outcomes: 
Carry Sum 

0+0= 0 0 

0+1= 0 1 

1+0= 0 l 

1+1= 1 0 


Logical Inverter 


Input Output 
X NOT X 
0 

1 0 


X ——o— NOT X 


Logical XOR Function 


Inputs Output 

X Y X XOR Y 
0 0 0 

0 1 1 

1 0 i 

1 1 0 


x 
X — J) Hx xory 


Logical NAND Function 


Inputs Output 
XOV X NAND Y 
0 0 1 

0 1 1 

LO 1 

oi 0 


Z| y= NAND Y 
Y 


Logical NOR Function 


Inputs Output 
Y X NOR Y 


=i=.S|S] 
ami k a 


Notice that when we add 1 + 1 we get 0 with a carry to the next higher place. We 
will need to determine the sum and the carry for this design. Notice that the sum column 


10 


above matches the output for the XOR function, and that the carry column matches the 
output for the AND function. Figure 0-3 (a) shows a simple adder implemented with XOR 
and AND gates. Figure 0-3 (b) shows the same logic circuit implemented with AND and 
OR gates. 


(a) Half-Adder Using XOR and AND (a) Half-Adder Using AND, OR, Inverters 


Figure 0-3. Two Implementations of a Half-Adder 


Figure 0-4 shows a block diagram 
of a half-adder. Two half-adders can be 
combined to form an adder that can add 
three input digits. This is called a full-adder. 
Figure 0-5 shows the logic diagram of a 
full-adder, along with a block diagram that 
masks the details of the circuit. Figure 0-6 
shows a 3-bit adder using three full-adders. 


Final Sum 


Figure 0-5. Full-Adder Built from a Half-Adder 


CHAPTER 0: INTRODUCTION TO COMPUTING 11 


Decoders 


Another example of the application of 
logic gates is the decoder. Decoders are wide- 
ly used for address decoding in computer 
design. Figure 0-7 shows decoders for 9 (1001 
binary) and 5 (0101) using inverters and AND 
gates. 


Flip-flops 


A widely used component in digital 
systems is the flip-flop. Frequently, flip-flops 
are used to store data. Figure 0-8 shows the 
logic diagram, block diagram, and truth table 
for a flip-flop. 

The D flip-flop (D-FF) is widely used 
to latch data. Notice from the truth table that a 
D-FF grabs the data at the input as the clock is 
activated. A D-FF holds the data as long as the 
power is on. 


Figure 0-6. 3-Bit Adder Using Three Full-Adders 


(a) Address decoder for 9 (binary 1001) (b) Address decoder for 5 (binary 0101) 
The output of the AND gate will be 1 The output of the AND gate will be 1 
if and only if the input is binary 1001. if and only if the input is binary 0101. 


Figure 0-7. Address Decoders 


x = don’t care 


(a) Circuit diagram (b) Block diagram (c) Truth table 


Figure 0-8. D Flip-Flops 


12 


Review Questions 


1. The logical operation gives a 1 output when all inputs are 1. 
The logical operation gives a | output when 1 or more of its inputs is 1. 
3. The logical operation is often used to compare if two inputs have the 
same value. 
4. A gate does not change the logic level of the input. 
5. Name a common use for flip-flops. 
6. An address is used to identify a predetermined binary address. 


SECTION 0.3: INSIDE THE COMPUTER 


In this section we provide an introduction to the organization and internal work- 
ing of computers. The model used is generic, but the concepts discussed are applicable to 
all computers, including the IBM PC, PS/2, and compatibles. Before embarking on this 
subject, it will be helpful to review definitions of some of the most widely used terminol- 
ogy in computer literature, such as K, mega, giga, byte, ROM, RAM, and so on. 


Some important terminology 


One of the most important features of a computer is how much memory it has. 
Here we review terms used to describe amounts of memory in x86 PCs and compatibles. 
Recall from the discussion above that a bit is a binary digit that can have the value 0 or 1. 
A byte is defined as 8 bits. A nibble is half a byte, or 4 bits. A word is two bytes, or 16 
bits. In a 32-bit computer a word is 32 bits. The display is intended to show the relative 
size of these units. Of course, they could all be composed of any combination of zeros and 
ones. 


Bit 
Nibble 
Byte 0000 


Word 0000 0000 0000 
S OE Word 0000 0000 0000 0000 0000 0000 0000 


A kilobyte is 2!° bytes, which is 1024 bytes. The Table 0-4: Power of 2 
abbreviation K is often used. A megabyte, or meg as 210 = 1024 = 1K 
some call it, is 22° bytes. That is a little over 1 million 220 = 1024K = 1M 
bytes; it is exactly 1,048,576 bytes. Moving rapidly up 539 = 1024M=1G 
the scale in size, a gigabyte is 230 bytes (over 1 billion), 740 = 1024G=1T 
and a terabyte is 24° bytes (over | trillion). As an exam- 
ple of how some of these terms are used, suppose that a 
given computer has 16 megabytes of memory. That would be 16 x 220, or 24 x 220, which 
is 224. Therefore, 16 megabytes is 224 bytes. See Table 0-4. 

Two types of memory commonly used in microcomputers are RAM, which stands 
for “random access memory” (sometimes called read/write memory), and ROM, which 
stands for “read-only memory.” RAM is used by the computer for temporary storage of 
programs that it is running. That data is lost when the computer is turned off. For this rea- 
son, RAM is sometimes called volatile memory. ROM contains programs and informa- 
tion essential to operation of the computer. The information in ROM is permanent, can- 
not be changed by the user, and is not lost when the power is turned off. Therefore, it is 
called nonvolatile memory. 


CHAPTER 0: INTRODUCTION TO COMPUTING 13 


Internal organization of computers 


The internal working of every computer can be broken down into three parts: 
CPU (central processing unit), memory, and I/O (input/output) devices (see Figure 0-9). 
The function of the CPU is to execute (process) information stored in memory. The func- 
tion of I/O devices such as the keyboard and video monitor is to provide a means of com- 
municating with the CPU. The CPU is connected to memory and I/O through strips of 
wire called a bus. The bus carries information from place to place inside a computer just 
as a street bus carries people from place to place. In every computer there are three types 
of buses: address bus, data bus, and control bus. 

For a device (memory or I/O) to be recognized by the CPU, it must be assigned 
an address. The address assigned to a given device must be unique; no two devices are 
allowed to have the same address. The CPU puts the address (in binary, of course) on the 
address bus, and the decoding circuitry finds the device. Then the CPU uses the data bus 
either to get data from that device or to send data to it. The control buses are used to pro- 
vide read or write signals to the device to indicate if the CPU is asking for information or 
sending it information. Of the three buses, the address bus and data bus determine the 
capability of a given CPU. 


Address Bus 


Memory Peripherals 


(monitor, 
(RAM, ROM) printer, etc.) 


Data Bus 


Figure 0-9. Inside the Computer 


More about the data bus 


Since data lines are used to carry information in and out of a CPU, the more data 
lines available, the better the CPU. If one thinks of data lines as highway lanes, it is clear 
that more lanes provide a better pathway between the CPU and its external devices (such 
as printers, RAM, ROM, etc.; see Figure 0-10). By the same token, that increase in the 
number of lanes increases the cost of construction. More data buses mean a more expen- 
sive CPU and computer. The grouping of data lines is called data bus. The average size 
of data buses in CPUs varies between 8 and 64. Early computers such as Apple 2 used an 
8-bit data bus, while supercomputers such as Cray use a 64-bit data bus. Data buses are 
bidirectional, since the CPU must use them either to receive or to send data. The process- 
ing power of a computer is related to the size of its buses, since an 8-bit bus can send out 
1 byte a time, but a 16-bit bus can send out 2 bytes at a time, which is twice as fast. 


ee 
14 


More about the address bus 


Since the address bus is used to identify the devices and memory connected to the 
CPU, the more address buses available, the larger the number of devices that can be 
addressed. In other words, the number of address buses for a CPU determines the num- 
ber of locations with which it can communicate. The number of locations is always equal 
to 2%, where x is the number of address lines, regardless of the size of the data bus. For 
example, a CPU with 16 address lines can provide a total of 65,536 (216) or 64K bytes of 
addressable memory. Each location can have a maximum of 1 byte of data. This is due 
to the fact that all general-purpose microprocessor CPUs are what is called byte address- 
able. As another example, the IBM PC AT used a CPU with 24 address lines and 16 data 
lines. In this case the total accessible memory is 16 megabytes (224 = 16 megabytes). In 
this example there would be 224 locations, and since each location is one byte, there 
would be 16 megabytes of memory. The address bus is a unidirectional bus, which means 
that the CPU uses the address bus only to send out addresses. To summarize: The total 
number of memory locations addressable by a given CPU is always equal to 2% where x 
is the number of address bits, regardless of the size of the data bus. 


Address Bus 


Control Bus 


Read/write 


Figure 0-10. Internal Organization of Computers 


CPU and its relation to RAM and ROM 


For the CPU to process information, the data must be stored in RAM or ROM. 
The function of ROM in computers is to provide information that is fixed and permanent. 
This is information such as tables for character patterns to be displayed on the video mon- 
itor, or programs that are essential to the working of the computer, such as programs for 
testing and finding the total amount of RAM installed on the system, or programs to dis- 
play information on the video monitor. In contrast, RAM is used to store information that 
is not permanent and can change with time, such as various versions of the operating sys- 
tem and application packages such as word processing or tax calculation packages. These 
programs are loaded into RAM to be processed by the CPU. The CPU cannot get the 
information directly from the disk since the disk is too slow. In other words, the CPU first 
seeks the information to be processed from RAM (or ROM). Only if it is not there does 
the CPU seek it from a mass storage device such as a disk, and then it transfers the infor- 
mation to RAM. For this reason, RAM and ROM are sometimes referred to as primary 
memory and disks are called secondary memory. Figure 0-11 shows a block diagram of 
the internal organization of the PC. 


CHAPTER 0: INTRODUCTION TO COMPUTING 15 


Instruction Register 


Instruction 
decoder, timing, 
and control 


sng ssolppv 


sosng 107000 


Internal 
buses 


Register A 


5 
= 
5 
ee) 
= 
z 


Register B 
Register C 
Regis-er D 


Figure 0-11. Internal Block Diagram of a CPU 


Inside CPUs 


A program stored in memory provides instructions to the CPU to perform an 
action. The action can simply be adding data such as payroll data or controlling a machine 
such as a robot. It is the function of the CPU to fetch these instructions from memory and 
execute them. To perform the actions of fetch and execute, all CPUs are equipped with 
resources such as the following: 


1. Foremost among the resources at the disposal of the CPU are a number of reg- 
isters. The CPU uses registers to store information temporarily. The information could 
be two values to be processed, or the address of the value needed to be fetched from 
memory. Registers inside the CPU can be 8-bit, 16-bit, 32-bit, or even 64-bit regis- 
ters, depending on the CPU. In general, the more and bigger the registers, the better 
the CPU. The disadvantage of more and bigger registers is the increased cost of such 
a CPU. 

2. The CPU also has what is called the ALU (arithmetic/logic unit). The ALU section of 
the CPU is responsible for performing arithmetic functions such as add, subtract, mul- 
tiply, and divide, and logic functions such as AND, OR, and NOT. 

3. Every CPU has what is called a program counter. The function of the program count- 
er is to point to the address of the next instruction to be executed. As each instruction 
is executed, the program counter is incremented to point to the address of the next 
instruction to be executed. The contents of the program counter are placed on the 
address bus to find and fetch the desired instruction. In the IBM PC, the program 
counter is a register called IP, or the instruction pointer. 

4. The function of the instruction decoder is to interpret the instruction fetched into the 
CPU. One can think of the instruction decoder as a kind of dictionary, storing the 
meaning of each instruction and what steps the CPU should take upon receiving a 
given instruction. Just as a dictionary requires more pages the more words it defines, 


re 
16 


a CPU capable of understanding more instructions requires more transistors to design. 
Internal working of computers 


To demonstrate some of the concepts discussed above, a step-by-step analysis of 
the process a CPU would go through to add three numbers is given next. Assume that an 
imaginary CPU has registers called A, B, C, and D. It has an 8-bit data bus and a 16-bit 
address bus. Therefore, the CPU can access memory from addresses 0000 to FFFFH (for 
a total of 10000H locations). The action to be performed by the CPU is to put hexadeci- 
mal value 21 into register A, and then add to register A values 42H and 12H. Assume that 
the code for the CPU to move a value to register A is 1011 0000 (BOH) and the code for 
adding a value to register A is 0000 0100 (04H). The necessary steps and code to perform 
them are as follows. 


Action Code Data 
Move value 21H into register A BOH Jala 
Add value 42H to register A 04H 42H 
Add value 12H to register A 04H TAH 


If the program to perform the actions listed above is stored in memory locations 
starting at 1400H, the following would represent the contents for each memory address 
location: 


Memory address Contents of memory address 


1400 (BO)code for moving a value to register A 
1401 (21) value to be moved 

1402 (04)code for adding a value to register A 
1403 (42) value to be added 

1404 (04)code for adding a value to register A 
1405 (12) value to be added 

1406 (F4) code for halt 


The actions performed by the CPU to run the program above would be as follows: 


1. The CPU’s program counter can have a value between 0000 and FFFFH. The pro- 
gram counter must be set to the value 1400H, indicating the address of the first 
instruction code to be executed. After the program counter has been loaded with the 
address of the first instruction, the CPU is ready to execute. 

2. The CPU puts 1400H on the address bus and sends it out. The memory circuitry finds 
the location while the CPU activates the READ signal, indicating to memory that it 
wants the byte at location 1400H. This causes the contents of memory location 
1400H, which is BO, to be put on the data bus and brought into the CPU. 

3. The CPU decodes the instruction BO with the help of its instruction decoder diction- 
ary. When it finds the definition for that instruction it knows it must bring into regis- 
ter A of the CPU the byte in the next memory location. Therefore, it commands its 
controller circuitry to do exactly that. When it brings in value 21H from memory loca- 
tion 1401, it makes sure that the doors of all registers are closed except register A. 
Therefore, when value 21H comes into the CPU it will go directly into register A. 
After completing one instruction, the program counter points to the address of the next 
instruction to be executed, which in this case is 1402H. Address 1402 is sent out on 
the address bus to fetch the next instruction. 


CHAPTER 0: INTRODUCTION TO COMPUTING 17 


4. From memory location 1402H it fetches code 04H. After decoding, the CPU knows 
that it must add to the contents of register A the byte sitting at the next address (1403). 
After it brings the value (in this case 42H) into the CPU, it provides the contents of 
register A along with this value to the ALU to perform the addition. It then takes the 
result of the addition from the ALU’s output and puts it in register A. Meanwhile the 
program counter becomes 1404, the address of the next instruction. 

5. Address 1404H is put on the address bus and the code is fetched into the CPU, decod- 
ed, and executed. This code is again adding a value to register A. The program count- 
er is updated to 1406H. 

6. Finally, the contents of address 1406 are fetched in and executed. This HALT instruc- 
tion tells the CPU to stop incrementing the program counter and asking for the next 
instruction. In the absence of the HALT, the CPU would continue updating the pro- 
gram counter and fetching instructions. 


Now suppose that address 1403H contained value 04 instead of 42H. How would 
the CPU distinguish between data 04 to be added and code 04? Remember that code 04 
for this CPU means move the next value into register A. Therefore, the CPU will not try 
to decode the next value. It simply moves the contents of the following memory location 
into register A, regardless of its value. 


Review Questions 


How many bytes is 24 kilobytes? 

What does “RAM” stand for? How is it used in computer systems? 

What does “ROM” stand for? How is it used in computer systems? 

Why is RAM called volatile memory? 

List the three major components of a computer system. 

What does “CPU” stand for? Explain its function in a computer. 

List the three types of buses found in computer systems and state briefly the purpose 

of each type of bus. 

8. State which of the following is unidirectional and which is bidirectional. 
(a) data bus (b) address bus 

9. Ifan address bus for a given computer has 16 lines, what is the maximum amount of 
memory it can access? 

10. What does “ALU” stand for? What is its purpose? 

11. How are registers used in computer systems? 

12. What is the purpose of the program counter? 

13. What is the purpose of the instruction decoder? 


22 4 e 


PROBLEMS 


SECTION 0.1: NUMBERING AND CODING SYSTEMS 


1. Convert the following decimal numbers to binary. 
(a) 12 = (b) 123 (c) 63 (d) 128 (e) 1000 
2. Convert the following binary numbers to decimal. 
(a) 100100 (b) 1000001 (c) 11101 (d) 1010 (e) 00100010 
3. Convert the values in Problem 2 to hexadecimal. 
4. Convert the following hex numbers to binary and decimal. 
(a)2B9H (b)F44H (c)912H (d)2BH (e) FFFFH 
5. Convert the values in Problem 1 to hex. 


a 


18 


6. Find the 2’s complement of the following binary numbers. 
(a) 1001010 (b) 111001 (c) 10000010 (d) 111110001 
7. Add the following hex values. 
(a) 2CH + 3FH (b) F34H + SD6H (c) 20000H + 12FFH (d) FFFFH + 2222H 
8. Perform hex subtraction for the following. 
(a) 24FH — 129H (b) FE9H — 5CCH (c) 2FFFFH — FFFFFH (d) 9FF25H — 4DD99H 
9. Show the ASCII codes for numbers 0, 1, 2, 3, ..., 9 in both hex and binary. 
10. Show the ASCII code (in hex) for the lames alan: 
“U.S.A. is a country” CR, LF 
“in North America” CR, LF 
CR is carriage return 
LF is line feed 


SECTION 0.2: DIGITAL PRIMER 


11. Draw a 3-input OR gate using a 2-input OR gate. 

12. Show the truth table for a 3-input OR gate. 

13. Draw a 3-input AND gate using a 2-input AND gate. 

14. Show the truth table for a 3-input AND gate. 

15. Design a 3-input XOR gate with a 2-input XOR gate. Show the truth table for a 3- 
input XOR. 

16. List the truth table for a 3-input NAND. 

17. List the truth table for a 3-input NOR. 

18. Show the decoder for binary 1100. 

19. Show the decoder for binary 11011. 

20. List the truth table for a D-FF. 


SECTION 0.3: INSIDE THE COMPUTER 


21. Answer the following: 
(a) How many nibbles are 16 bits? 
(b) How many bytes are 32 bits? 
(c) If a word is defined as 16 bits, how many words is a 64-bit data item? 
(d) What is the exact value (in decimal) of 1 meg? 
(e) How many K is | meg? 
(f) What is the exact value (in decimal) of 1 giga? 
(g) How many K is 1 giga? 
(h) How many meg is | giga? 
(i) Ifa given computer has a total of 8 megabytes of memory, how many 
bytes (in decimal) is this? How many kilobytes is this? 

22. A given mass storage device such as a hard disk can store 2 gigabytes of information. 
Assuming that each page of text has 25 rows and each row has 80 columns of ASCII 
characters (each character = | byte), approximately how many pages of information 
can this disk store? 

23. In a given byte-addressable computer, memory locations 10000H to 9FFFFH are 
available for user programs. The first location is 10000H and the last location is 
9FFFFH. Calculate the following: 

(a) The total number of bytes available (in decimal) 
(b) The total number of kilobytes (in decimal) 

24. A given computer has a 32-bit data bus. What is the largest number that can be car- 

ried into the CPU at a time? 


CHAPTER 0: INTRODUCTION TO COMPUTING 19 


25. Below are listed several computers with their data bus widths. For each computer, list 
the maximum value that can be brought into the CPU at a time (in both hex and dec- 
imal). 

(a) Apple 2 with an 8-bit data bus 
(b) IBM PC with a 16-bit data bus 
(c) IBM PC with a 32-bit data bus 
(d) Cray supercomputer with a 64-bit data bus 

26. Find the total amount of memory, in the units requested, for each of the following 

CPUs, given the size of the address buses. 

(a) 16-bit address bus (in K) 

(b) 24-bit address bus (in megabytes) 

(c) 32-bit address bus (in megabytes and gigabytes) 

(d) 48-bit address bus (in megabytes, gigabytes, and terabytes) 

27. Regarding the data bus and address bus, which is unidirectional and which is bidirec- 
tional? 

28. Which register of the CPU holds the address of the instruction to be fetched? 

29. Which section of the CPU is responsible for performing addition? 

30. List the three bus types present in every CPU. 


ANSWERS TO REVIEW QUESTIONS 


SECTION 0.1: NUMBERING AND CODING SYSTEMS 


1. Computers use the binary system because each bit can have one of two voltage lev- 
els: on and off. 


2. 3419 = 1000105 = 2216 
gi OO = Sem 9210 
4. 1110001 
5. 010100 


SECTION 0.2: DIGITAL PRIMER 


1. AND 

2. OR 

3. XOR 

4. Buffer 

5. Storing data 
6. Decoder 


SECTION 0.3: INSIDE THE COMPUTER 


1. 24,576 

2. Random access memory; it is used for temporary storage of programs that the CPU 
is running, such as the operating system, word processing programs, etc. 

3. Read-only memory; it is used for permanent programs such as those that control the 
keyboard, etc. 

4. The contents of RAM are lost when the computer is powered off. 

5. The CPU, memory, and I/O devices 

6. Central processing unit; it can be considered the “brain” of the computer; it executes 


A a ag 


20 


the programs and controls all other devices in the computer. 

7. The address bus carries the location (address) needed by the CPU; the data bus car- 
ries information in and out of the CPU; the control bus is used by the CPU to send 
signals controlling I/O devices. 

8. (a) bidirectional (b) unidirectional 

9. 64K, or 65,536 bytes 

10. Arithmetic/logic unit; it performs all arithmetic and logic operations. 

11. It is for temporary storage of information. 

12. It holds the address of the next instruction to be executed. 

13. It tells the CPU what steps to perform for each instruction. 


p 


CHAPTER 0: INTRODUCTION TO COMPUTING 21 


22 


CHAPTER 1 


THE x86 MICROPROCESSOR 


OBJECTIVES 
Upon completion of this chapter, you will be able to: 


>> Describe the Intel family of microprocessors from the 8085 to the 
Pentium in terms of bus size, physical memory, and special features 

>> Explain the function of the EU (execution unit) and BIU (bus 
interface unit) 

>> Describe pipelining and how it enables the CPU to work faster 

>> List the registers of the 8086 

>> Code simple MOV and ADD instructions and describe the effect of 
these instructions on their operands 

>> State the purpose of the code segment, data segment, stack segment, 
and extra segment 

>> Explain the difference between a logical address and a physical address 

>> Describe the “little endian” storage convention of x86 microprocessors 

>> State the purpose of the stack 

>> Explain the function of PUSH and POP instructions 

>> List the bits of the flag register and briefly state the purpose of each bit 

>> Demonstrate the effect of ADD instructions on the flag register 

>> List the addressing modes of the 8086 and recognize examples of each 
mode 

>> Know how to use flowcharts and pseudocode in program development 


This chapter examines the architecture of the 8086 with some examples of 
Assembly language programming. Section 1.1 gives a history of the evolution of Intel's 
family of x86 microprocessors. An overview of the internal workings of 8086 micro- 
processors is given in Section 1.2. An introduction to 8086 Assembly language program- 
ming is covered in Section 1.3. Sections 1.4 and 1.5 cover code and stack segments 
respectively and show how physical addresses are generated. Section 1.6 explores the flag 
register and its use in Assembly language programming. Finally, Section 1.7 describes in 
detail the addressing modes of the 8086. 


SECTION 1.1: BRIEF HISTORY OF THE x86 FAMILY 


In this section we trace the evolution of Intel's family of microprocessors from the 
late 1970s, when the personal computer had not yet found widespread acceptance, to the 
powerful microcomputers widely in use today. 


Evolution from 8080/8085 to 8086 


In 1978, Intel Corporation introduced a 16-bit microprocessor called the 8086. 
This processor was a major improvement over the previous generation 8080/8085 series 
Intel microprocessors in several ways. First, the 8086's capacity of 1 megabyte of memo- 
ry exceeded the 8080/8085's capability of handling a maximum of 64K bytes of memory. 
Second, the 8080/8085 was an 8-bit system, meaning that the microprocessor could work 
on only 8 bits of data at a time. Data larger than 8 bits had to be broken into 8-bit pieces 
to be processed by the CPU. In contrast, the 8086 is a 16-bit microprocessor. Third, the 
8086 was a pipelined processor, as opposed to the nonpipelined 8080/8085. In a system 
with pipelining, the data and address buses are busy transferring data while the CPU is 
processing information, thereby increasing the effective processing power of the micro- 
processor. Although pipelining was a common feature of mini- and mainframe computers, 
Intel was a pioneer in putting pipelining on a single-chip microprocessor. Section 1.2 dis- 
cusses pipelining. Table 1-1 shows the evolution of Intel microprocessors up to the 8088. 


Table 1-1: Evolution of Intel’s Microprocessors (from the 8008 to the 8088) 


Product 8008 8080 8085 8086 8088 
Year introduced (1972 1974 1976 1978 199 
Technology PMOS NMOS NMOS NMOS NMOS 
Number of pins 18 40 40 40 40 
Number of transistors 3000 4500 6500 29,000 29,000 
Number of instructions 66 111 113 133 133 
Physical memory 16K 64K 64K 1M 1M 
Virtual memory None None None None None 
Internal data bus 8 8 8 16 16 
External data bus 8 8 8 16 8 
Address bus 8 16 16 20 20 
Data types 8 8 8 8/16 8/16 


Evolution from 8086 to 8088 


The 8086 is a microprocessor with a 16-bit data bus internally and externally, 
meaning that all registers are 16 bits wide and there is a 16-bit data bus to transfer data in 
and out of the CPU. Although the introduction of the 8086 marked a great advancement 
over the previous generation of microprocessors, there was still some resistance in using 
the 16-bit external data bus since at that time all peripherals were designed around an 8- 
bit microprocessor. In addition, a printed circuit board with a 16-bit data bus was much 
more expensive. Therefore, Intel came out with the 8088 version. It is identical to the 
8086 as far as programming is concerned, but externally it has an 8-bit data bus instead of 
a 16-bit bus. It has the same memory capacity, 1 megabyte. 


Se 


24 


Success of the 8083 


In 1981, Intel's fortunes changed forever when IBM picked up the 8088 as their 
microprocessor of choice in designing the IBM PC. The 8088-based IBM PC was an enor- 
mous success, largely because IBM and Microsoft (the developer of the MS- 
DOS/Windows operating system) made it an open system, meaning that all documenta- 
tion and specifications of the hardware and software of the PC were made public. This 
made it possible for many other vendors to clone the hardware successfully and thus 
spawned a major growth in both hardware and software designs based on the IBM PC. 
This is in contrast with the Apple computer, which was a closed system, blocking any 
attempt at cloning by other manufacturers, both domestically and overseas. 


Other microprocessors: the 80286, 80386, and 80485 


With a major victory behind Intel and a need from PC users for a more powerful 
microprocessor, Intel introduced the 80286 in 1982. Its features included 16-bit internal 
and external data buses; 24 address lines, which give 16 megabytes of memory (224 = 16 
megabytes); and most significantly, virtual memory. The 80286 can operate in one of two 
modes: real mode or protected mode. Real mode is simply a faster 8088/8086 with the 
same maximum of 1 megabyte of memory. Protected mode allows for 16M of memory 
but is also capable of protecting the operating system and programs from accidental or 
deliberate destruction by a user, a feature that is absent in the single-user 8088/8086. 
Virtual memory is a way of fooling the microprocessor into thinking that it has access to 
an almost unlimited amount of memory by swapping data between disk storage and RAM. 
IBM picked up the 80286 for the design of the IBM PC AT, and the clone makers followed 
IBM's lead. 

With users demanding even more powerful systems, in 1985 Intel introduced the 
80386 (sometimes called 80386DX), internally and externally a 32-bit microprocessor 
with a 32-bit address bus. It is capable of handling physical memory of up to 4 gigabytes 
(232). Virtual memory was increased to 64 terabytes (24°). All microprocessors discussed 
so far were general-purpose microprocessors and could not handle mathematical calcula- 
tions rapidly. For this reason, Intel introduced numeric data processing chips, called math 
coprocessors, such as the 8087, 80287, and 80387. Later Intel introduced the 386SX, 
which is internally identical to the 80386 but has a 16-bit external data bus and a 24-bit 
address bus, which gives a capacity of 16 megabytes (224) of memory. This makes the 
386SX system much cheaper. With the introduction of the 80486 in 1989, Intel put a great- 
ly enhanced version of the 80386 and the math coprocessor on a single chip plus addition- 
al features such as cache memory. Cache memory is static RAM with a very fast access 
time. Table 1-2 summarizes the evolution of Intel's microprocessors from the 8086 to the 
Pentium Pro. It must be noted that all programs written for the 8088/86 will run on 286, 
386, and 486 computers. 


Table 1-2: Evolution of Intel’s Microprocessors (from the 8086 to the Pentium Pro 


Product 8086 80286 80386 80486 Pentium Pentium Pro 


Year Introduced 1978 1982 1985 1989 1993 1995 
Technolo NMOS NMOS CMOS CMOS BICMOS BICMOS 
Clock rate (MHz) 3-10 10-16 16-33 25-33 60, 66 150 
Number of pins 40 68 IB? 168 23 387 
Number of transistors 29,000 134,000 275,000 1.2 mill. 3.1 mill. 5.5 mill. 
Physical memory 1M 16M 4G 4G 4G 64G 
Virtual memory None 1G 64T 64T 64T 64T 
Internal data bus 16 16 By a2 32 32 
External data bus 16 16 8 a2 64 64 
Address bus 20 24 22 32 32 36 


Data types 8/16 8/16 8/16/32 8/16/32 8/16/32 8/16/32 


ccc! 
CHAPTER 1: THE x86 MICROPROCESSOR 25 


e-source 


See Intel's Microprocessor Quick Reference Guide at: 


http://www. intel.com/pressroom/kits/quickreffam.htm 


Table 1-3: Evolution of Intel Microprocessors: From Pentium II to Itanium ___ 


Product Pentium II Pentium III Pentium 4 Itanium II 
Year introduced 1997 1999 2000 2002 
Technolo BICMOS BICMOS BICMOS BICMOS 
Number of transistors 7.5 mill. 9.5 mill. 42 mill. 220 mill. 
Cache size JAK JPR 512K 3MB 
Physical memory 64G 64G 64G 64G 
Virtual memory 64T 64T 64T 64T i 
Internal data bus 32 32 bys 64 
External data bus 64 64 64 64 
Address bus 36 36 36 64 
Data types 8/16/32 8/16/32 8/16/32 8/16/32/64 


Pentium and Pentium Pro 


In 1992, Intel announced release of the newest x86 microprocessor, the Intel 
Pentium. It was given the name Pentium instead of the expected name 80586 because 
numbers cannot be copyrighted, whereas a name such as Pentium can be copyrighted. By 
using submicron fabrication technology, Intel designers were able to utilize more than 3 
million transistors on the Pentium chip. Upon its release, the Pentium had speeds of 60 
and 66 MHz, but new design features made its processing speed twice that of the 66-MHz 
80486 and over 300 times faster than that of the original 8088. The Pentium processor is 
fully compatible with previous x86 processors but includes several new features, includ- 
ing separate 8K cache memory for code and data, a 64-bit bus, and a vastly improved 
floating-point processor. The Pentium is packaged in a 273-pin PGA chip. It uses BIC- 
MOS technology, which combines the speed of bipolar transistors with the power efficien- 
cy of CMOS technology. Although it has a 64-bit data bus, its registers are 32-bit and it 
has a 32-bit address bus capable of addressing 4 gigabytes of memory. In 1995 Intel intro- 
duced the Pentium Pro, the sixth generation of the x86 family. It is an enhanced version 
of the Pentium that uses 5.5 million transistors. It was designed to be used primarily for 
32-bit servers and workstations. Table 1-3 shows the evolution of Intel's microprocessors 
from the Pentium II to the Itanium II. 


Pentium il 


In 1997 Intel introduced its Pentium II processor. This 7.5-million-transistor 
processor featured MMX (MultiMedia extention) technology incorporated into the CPU. 
MMX allows for fast graphics and audio processing. In 1998 the Pentium II Xeon proces- 
sor was released. Its primary market is for servers and workstations. In 1999 the Celeron 
was released. Its lower cost and good performance make it ideal for PCs used to meet edu- 
cational and home business needs. 


Pentium Ill 


In 1999 Intel released the Pentium III. This 9.5-million-transistor processor 
includes 70 new instructions called SIMD that enhance video and audio performance in 
such areas as 3-D imaging, and streaming audio that have become common features of on- 
line computing. In 1999 Intel also introduced the Pentium III Xeon processor, designed 
more for servers and business workstations with multiprocessor configurations. 


NE Ell E E 
26 


Pentium 4 


The Pentium 4, which debuted late in 1999, boasts the speeds of 1.4 to 1.5 GHz. 
The Pentium 4 represents the first completely new architecture since the development of 
the Pentium Pro. The new 32-bit architecture, called NetBurst, is designed for heavy mul- 
timedia processing such as video, music, and graphic file manipulation on the Internet. 
The system bus operates at 400 MHz. In addition, new cache and pipelining technology 
and an expansion of the multimedia instruction set are designed to make the P4 a high- 
end media processing microprocessor. 


Intel 64 Architecture 


Intel has selected Itanium as the new brand name for the first product in its 64-bit 
family of processors, formerly called Merced. The evolution of microprocessors is 
increasingly influenced by the evolution of the Internet. The Itanium architecture is 
designed to meet Internet-driven needs for powerful servers and high-performance work- 
stations. The Itanium will have the ability to execute many instructions simultaneously 
plus extremely large memory capabilities. See Chapter 24 for more. 


Review Questions 


1. Name three features of the 8086 that were improvements over the 8080/8085. 

2. What is the major difference between 8088 and 8086 microprocessors? 

3. Give the size of the address bus and physical memory capacity of the following: 

(a) 8086 (b) 80286 (c) 80386 

The 80286 is a -bit microprocessor, whereas the 80386 is a -bit 
microprocessor. 

5. State the major difference between the 80386 and the 80386SX. 

6. List additional features introduced with the 80286 that were not present in the 8086. 
7. List additional features of the 80486 that were not present in the 80386. 

8. List additional features of the Pentium that were not present in the 80486. 

9. How many transistors did the Pentium II use? 

10. Which microprocessor was the first to incorporate MMX technology on-chip? 

11. Give the additional features of the Pentium IJ that were not present in the Pentium. 
12. Give all the data types supported by the Pentium 4. 

13. Give all the data types supported by the Itanium. 

14. True or false. Itanium has a 64-bit architecture. 


SECTION 1.2: INSIDE THE 8088/86 


> 


In this section we explore concepts important to the internal operation of the 
8088/86, such as pipelining and registers. See the block diagram in Figure 1-1. 


Pipelining 


There are two ways to make the CPU process information faster: increase the 
working frequency or change the internal architecture of the CPU. The first option is tech- 
nology dependent, meaning that the designer must use whatever technology is available 
at the time, with consideration for cost. The technology and materials used in making ICs 
(integrated circuits) determine the working frequency, power consumption, and the num- 
ber of transistors packed into a single-chip microprocessor. More discussion of IC tech- 
nology is given in Chapter 25. It is sufficient for the purpose at hand to say that designers 
can make the CPU work faster by increasing the frequency under which it runs if technol- 
ogy and cost allow. The second option for improving the processing power of the CPU 
has to do with the internal working of the CPU. In the 8085 microprocessor, the CPU 
could either fetch or execute at a given time. In other words, the CPU had to fetch an 
instruction from memory, then execute it and then fetch again, execute it, and so on. 

The idea of pipelining in its simplest form is to allow the CPU to fetch and exe- 
cute at the same time as shown in Figure 1-2. 


= rE ESO —_———EE—— 


CHAPTER 1: THE x86 MICROPROCESSOR 27 


Execution Unit (EU) Bus Interface Unit (BIU) 


Multiplexed Address generation 


and bus control 


Operands 


Instruction 


Figure 1-1. Internal Block Diagram of the 8088/86 CPU 
(Reprinted by permission of Intel Corporation, Copyright Intel Corp. 1989) 


Nonpipelined fetch 1 exec | fetch 2 exec 2 
(e.g., 8085) 


Pipelined fetch 1 exec | 
(e.g., 8086) 


Figure 1-2. Pipelined vs. Nonpipelined Execution 


28 


Intel implemented the concept of pipelining in the 8088/86 by splitting the inter- 
nal structure of the microprocessor into two sections: the execution unit (EU) and the bus 
interface unit (BIU). These two sections work simultaneously. The BIU accesses mem- 
ory and peripherals while the EU executes instructions previously fetched. This works 
only if the BIU keeps ahead of the EU; thus the BIU of the 8088/86 has a buffer, or queue 
(see Figure 1-1). The buffer is 4 bytes long in the 8088 and 6 bytes in the 8086. If any 
instruction takes too long to execute, the queue is filled to its maximum capacity and the 
buses will sit idle. The BIU fetches a new instruction whenever the queue has room for 
2 bytes in the 6-byte 8086 queue, and for 1 byte in the 4-byte 8088 queue. In some cir- 
cumstances, the microprocessor must flush out the queue. For example, when a jump 
instruction is executed, the BIU starts to fetch information from the new location in mem- 
ory and information in the queue that was fetched previously is discarded. In this situa- 
tion the EU must wait until the BIU fetches the new instruction. This is referred to in com- 
puter science terminology as a branch penalty. In a pipelined CPU, this means that too 
much jumping around reduces the efficiency of a program. Pipelining in the 8088/86 has 
two stages, fetch and execute, but in more powerful computers pipelining can have many 
stages. The concept of pipelining combined with an increased number of data bus pins 
has, in recent years, led to the design of very powerful microprocessors. 

Registers 
AX 

In the CPU, registers are used to store 16-bit register 
information temporarily. That information could 
be one or two bytes of data to be processed or the 
address of data. The registers of the 8088/86 fall AH AL 
into the six categories outlined in Table 1-4. The 8-bit register 8-bit register 
general-purpose registers in 8088/86 micro- 
processors can be accessed as either 16-bit or 8- 
bit registers. All other registers can be accessed only as the full 16 bits. In the 8088/86, 
data types are either 8 or 16 bits. To access 12-bit data, for example, a 16-bit register must 
be used with the highest 4 bits set to 0. The bits of a register are numbered in descending 
order, as shown below. 


8-bit register: 


Eg 
16-bit register: 
Ea a a 


Different registers in the 8088/86 are used for different functions, and since some 
instructions use only specific registers to perform their tasks, the use of registers will be 
described in the context of instructions and their application in a given program. The first 
letter of each general register indicates its use. AX is used for the accumulator, BX as a 
base addressing register, CX as a counter in loop operations, and DX to point to data in 
I/O operations. Table 1-4 lists the registers of the 8088/86/286. 


Categor Bits Register Names 


General 16 AX, BX, CX, DX 
8 ARAL BH, BL,CHsCL,DH.DL 
Pointer 16 SP (stack pointer), BP (base pointer 
Index 16 SI (source index), DI (destination index 
Segment 16 CS (code segment), DS (data segment), 
SS (stack segment), ES (extra segment) 
Instruction 16 IP (instruction pointer 


Fla 16 FR (flag register 
Note: The general registers can be accessed as the 16 bits (such as AX), or as the high byte 


only (AH) or low byte only (AL). 


CHAPTER 1: THE x86 MICROPROCESSOR 29 


Review Questions 


1. Explain the functions of the EU and the BIU. 

2. What is pipelining, and how does it make the CPU execute taster? 
3. Registers of the 8086 are either bits or bits in length. 
4. List the 16-bit registers of the 8086. 


SECTION 1.3: INTRODUCTION TO ASSEMBLY PROGRAMMING 


While the CPU can work only in binary, it can do so at very high speeds. 
However, it is quite tedious and slow for humans to deal with Os and 1s in order to pro- 
gram the computer. A program that consists of Os and 1s is called machine language, and 
in the early days of the computer, programmers actually coded programs in machine lan- 
guage. Although the hexadecimal system was used as a more efficient way to represent 
binary numbers, the process of working in machine code was still cumbersome for 
humans. Eventually, Assembly languages were developed, which provided mnemonics for 
the machine code instructions, plus other features that made programming faster and less ° 
prone to error. The term mnemonic is typically used in computer science and engineering 
literature to refer to codes and abbreviations that are relatively easy to remember. 
Assembly language programs must be translated into machine code by a program called 
an assembler. Assembly language is referred to as a low-level language because it deals 
directly with the internal structure of the CPU. To program in Assembly language, the pro- 
grammer must know the number of registers and their size, as well as other details of the 
CPU. 

Today, one can use many different programming languages, such as C/C++, 
BASIC, C#, and numerous others. These languages are called high-level languages 
because the programmer does not have to be concerned with the internal details of the 
CPU. Whereas an assembler is used to translate an Assembly language program into 
machine code (sometimes called object code), high-level languages are translated into 
machine code by a program called a compiler. For instance, to write a program in C, one 
must use a C compiler to translate the program into machine language. 

There are numerous assemblers available for translating x86 Assembly language 
programs into machine code. One of the most commonly used assemblers, MASM by 
Microsoft, is introduced in Chapter 2. The present chapter is designed to correspond to 
Appendix A: DEBUG Programming. The program in this chapter can be entered and run 
with the use of the DEBUG program. If you are not familiar with DEBUG, refer to 
Appendix A for a tutorial introduction. The DEBUG utility is provided with the Microsoft 
Windows operating system and therefore is widely accessible. 


Assembly language programming 


An Assembly language program consists of, among other things, a series of lines 
of Assembly language instructions. An Assembly language instruction consists of a 
mnemonic, optionally followed by one or two operands. The operands are the data items 
being manipulated, and the mnemonics are the commands to the CPU, telling it what to 
do with those items. We introduce Assembly language programming with two widely 
used instructions: the move and add instructions. 


MOV instruction 


Simply stated, the MOV instruction copies data from one location to another. It 
has the following format: 


MOV destination, sSoure@® ;copy source operand to destination 


This instruction tells the CPU to move (in reality, copy) the source operand to the 
destination operand. For example, the instruction "MOV Dx,Cx" copies the contents of 
register CX to register DX. After this instruction is executed, register DX will have the 
same value as register CX. The MOV instruction does not affect the source operand. The 
T 
30 


following program first loads CL with value 55H, then moves this value around to vari- 
ous registers inside the CPU. 


MOV CL,55H ;move 55H into register CL 

MOV DL,CL ;copy the contents of CL into DL (now DL=CL=55H) 
MOV AH,DL ;copy the contents of DL into AH (now AH=DL=55H) 
MOV AL,AH ;copy the contents of AH into AL (now AL=AH=55H) 
MOV BH,CL ;copy the contents of CL into BH (now BH=CL=55H) 
MOV CH,BH ;copy the contents of BH into CH (now CH=BH=55H) 


The use of 16-bit registers is demonstrated below. 


MOV CX,468FH ;move 468FH into CX (now CH=46,CL=8F) 

MOV AX,CX ;copy contents of CX to AX (now AX=CX=468FH) 
MOV DX,AX ,copy contents of AX to DX (now DX=AX=468FH) 
MOV BX,DX ;copy contents of DX to BX (now BX=DX=468FH) 


MOV PBX 7;now DI=BX=468FH 
MOM ST DI ¿now SI=DI=468FH 
MOV DS, Sm ;now DS=SI=468FH 
MOV BP,DI ;now BP=DI=468FH 


In the 8086 CPU, data can be moved among all the registers shown in Table 1-4 
(except the flag register) as long as the source and destination registers match in size. 
Code such as "MOV AL, Dx" will cause an error, since one cannot move the contents of a 
16-bit register into an 8-bit register. There is no such instruction as "MOV FR,AX". 
Loading the flag register is done through other means, discussed in later chapters. 

If data can be moved among all registers including the segment registers, can data 
be moved directly into all registers? The answer is no. Data can be moved directly into 
nonsegment registers only, using the MOV instruction. For example, look at the follow- 
ing instructions to see which are legal and which are illegal. 


MOV AX,58FCH ;move 58FCH into AX LEGAL) 
MOV DX,6678H ;move 6678H into DX LEGAL) 
MOV SI,924BH -;move 924B into SI LEGAL) 


( 
( 
( 
MOV BP,2459H ;move 2459H into BP (LEGAL) 
( 
( 
( 


MON TEDS, 2341H ;move 2341H into DS ILLEGAL) 
MOVING, Somos move 88/7/6H into CX LEGAL) 
MONM MCS SRTR move Ska 7H InCo CS ILLEGAL) 


MOV BH, 99H ¿move 99H into BH (LEGAL) 
From the discussion above, note the following three points: 


1. Values cannot be loaded directly into any segment register (CS, DS, ES, or SS). To 
load a value into a segment register, first load it to a nonsegment register and then 
move it to the segment register, as shown next. 


MOV AX,2345H ;load 23450 into AX 
MOV DS,AX ;then load the value of AX into DS 


MOV DI,1400H ;load 1400H into DI 
MOV mESPDI jthen move it into ES, now ES=DI=1400 
2. Ifa value less than FFH is moved into a 16-bit register, the rest of the bits are assumed 
to be all zeros. For example, in "MOV Bx, 5" the result will be BX = 0005; that is, 
BH = 00 and BL = 05. 
3. Moving a value that is too large into a register will cause an error. 


MOV BL, 7/F2H ; ILLEGAL: 7F2H is larger than 8 bits 
MOV AX,2FE456H ;ILLEGAL: the value is larger than AX 


CHAPTER 1: THE x86 MICROPROCESSOR 31 


ADD instruction 
The ADD instruction has the following format: 


ADD destination,source ;ADD the source operand to the destination 


The ADD instruction tells the CPU to add the source and the destination operands 
and put the result in the destination. To add two numbers such as 25H and 34H, each can 
be moved to a register and then added together: 


MOV AL,25H ;move 25 into AL 
MOV BL,34H move 34 into BL 
ADD AL,BL ,;AL = AL + BL 


Executing the program above results in AL = 59H (25H + 34H = 59H) and BL = 
34H. Notice that the contents of BL do not change. The program above can be written 
in many ways, depending on the registers used. Another way might be: 


MOV DH,25H ;move 25 into DH 
MOV Cl 34 move 34 into Cr 
ADD DH,CL Peele! (Cll Co IDIslS JDel = IDsh ar icy 


The program above results in DH = 59H and CL = 34H. There are always many 
ways to write the same program. One question that might come to mind after looking at 
the program above is whether it is necessary to move both data items into registers before 
adding them together. No, it is not necessary. Look at the following variation: 


MOV DH,25H ;load one operand into DH 
ADD DH,34H ;add the second operand to DH 


In the case above, while one register contained one value, the second value fol- 
lowed the instruction as an operand. This is called an immediate operand. The examples 
shown so far for the ADD and MOV instructions show that the source operand can be 
either a register or immediate data. In the examples above, the destination operand has 
always been a register. The format for Assembly language instructions, descriptions of 
their use, and a listing of legal operand types are provided in Appendix B. 

The largest number that an 8-bit register can hold is FFH. To use numbers larger 
than FFH (255 decimal), 16-bit registers such as AX, BX, CX, or DX must be used. For 
example, to add 34EH and 6A5H, the following program can be used: 


MOV AX,34EH ;move 34EH into AX 
MOV DX,6A5H ;move 6A5H into DX 
ADD DX, AX ;add AX to DX: DX = DX + AX 


Running the program above gives DX = 9F3H (34E + 6A5 = 9F3) and AX = 34E. 
Again, any 16-bit nonsegment registers could have been used to perform the action above: 


MOV CX,34EH z Load 34EH into CX 
ADD CX,6A5H ;add 6A5H to CX (now CX=9F3H) 


The general-purpose registers are typically used in arithmetic operations. 
Register AX is sometimes referred to as the accumulator. 


Review Questions 


1. Write the Assembly language instruction to move value 1234H into register BX. 

Write the Assembly language instructions to add the values 16H and ABH. Place the 

result in register AX. 
3. No value can be moved directly into which registers? 
4. What is the largest hex value that can be moved into a 16-bit register? Into an 8-bit 

register? What are the decimal equivalents of these hex values? 
re 
32 


SECTION 1.4: INTRODUCTION TO PROGRAM SEGMENTS 


A typical Assembly language program consists of at least three segments: a code 
segment, a data segment, and a stack segment. The code segment contains the Assembly 
language instructions that perform the tasks that the program was designed to accomplish. 
The data segment is used to store information (data) that needs to be processed by the 
instructions in the code segment. The stack is used by the CPU to store information tem- 
porarily. In this section we describe the code and data segments of a program in the con- 
text of some examples and discuss the way data is stored in memory. The stack segment 
is covered in Section 1.5. 


Origin and definition of the segment 


A segment is an area of memory that includes up to 64K bytes and begins on an 
address evenly divisible by 16 (such an address ends in 0H). The segment size of 64K 
bytes came about because the 8085 microprocessor could address a maximum of 64K 
bytes of physical memory since it had only 16 pins for the address lines (216 = 64K). This 
limitation was carried into the design of the 8088/86 to ensure compatibility. Whereas in 
the 8085 there was only 64K bytes of memory for all code, data, and stack information, 
in the 8088/86 there can be up to 64K bytes of memory assigned to each category. Within 
an Assembly language program, these categories are called the code segment, data seg- 
ment, and stack segment. For this reason, the 8088/86 can only handle a maximum of 64K 
bytes of code, 64K bytes of data, and 64K bytes of stack at any given time, although it has 
a range of 1 megabyte of memory because of its 20 address pins (229 = 1 megabyte). How 
to move this window of 64K bytes to cover all 1 megabyte of memory is discussed below, 
after we discuss logical address and physical address. 


Logical address and physical address 


In Intel literature concerning the 8086, there are three types of addresses men- 
tioned frequently: the physical address, the offset address, and the logical address. The 
physical address is the 20-bit address that is actually put on the address pins of the 8086 
microprocessor and decoded by the memory interfacing circuitry. This address can have 
a range of 00000H to FFFFFH for the 8086 and real-mode 286, 386, and 486 CPUs. This 
is an actual physical location in RAM or ROM within the | megabyte memory range. The 
offset address is a location within a 64K-byte segment range. Therefore, an offset address 
can range from 0000H to FFFFH. The logical address consists of a segment value and an 
offset address. The differences among these addresses and the process of converting from 
one to another is best understood in the context of some examples, as shown next. 


Code segment 


To execute a 
program, the 8086 CS IP 
fetches the instructions 
opcodes and operands) n 
a the code segment. aE eee i paia 
The logical address of 
an instruction always 
consists of a CS (code segment) and an IP (instruction pointer), shown in CS:IP format. 
The physical address for the location of the instruction is generated by shifting the CS left 
one hex digit and then adding it to the IP. IP contains the offset address. The resulting 20- 
bit address is called the physical address since it is put on the external physical address 
bus pins to be decoded by the memory decoding circuitry. To clarify this important con- 
cept, assume values in CS and IP as shown in the diagram. The offset address is contained 
in IP; in this case it is 9SF3H. The logical address is CS:IP, or 2500:95F3H. The physical 
address will be 25000 + 95F3 = 2ESF3H. The physical address of an instruction can be 
calculated as follows: 


CHAPTER 1: THE x86 MICROPROCESSOR 33 


1. Start with CS. 


ed 
p amines PEPE 


3 Add IP. 


me 
4. Physical address. 


The microprocessor will retrieve the instruction from memory locations starting 
at 2E5F3. Since IP can have a minimum value of 0000H and a maximum of FFFFH, the 
logical address range in this example is 2500:0000 to 2500:FFFF. This means that the 
lowest memory location of the code segment above will be 25000H (25000 + 0000) and, 
the highest memory location will be 34FFFH (25000 + FFFF). What happens if the 
desired instructions are located beyond these two limits? The answer is that the value of 
CS must be changed to access those instructions. See Example 1-1. 


Example 1-1 


If CS = 24F6H and IP = 634AH, show (a) the logical address, and (b) the offset address. 
Calculate (c) the physical address, (d) the lower range, and (e) the upper range of the 
code segment. 


Solution: 


(a) 24F6:634A (b) 634A (c) 2B2AA (24F60 + 634A) 
(d) 24F60 (24F60 + 0000) (e) 34F SF (24F60 + FFFF) 


Logical address vs. physical address in the code segment 


In the code segment, CS and IP hold the logical address of the instructions to be 
executed. The following Assembly language instructions have been assembled (translat- 
ed into machine code) and stored in memory. The three columns show the logical address 
of CS:IP, the machine code stored at that address, and the corresponding Assembly lan- 
guage code. 


LOGICAL ADDRESS MACHINE LANGUAGE ASSEMBLY LANGUAGE 
CSE OPCODE AND OPERAND MNEMONICS AND OPERAND 
LL SA SOIL BOST MOV AL,57 

WA SAECO LOZ B686 MOV DH, 86 

Dis 22 O04 B272 MOVERIDITIZ 

M SA LeS 89D1 MOV CX,DX 

1232 3008 88C7 MOV BH,AL 

WILSZ SOMO Boge MOVES Sito By 

LULZ SOLO B420 MOV AH, 20 

LU S 2 BOWE 01D0 ADD AX,DX 

LULZ s OLIO 01D9 PIOND) CK TDX 

SAE OLI 0585F ADD TAXMIBSS 


The program above shows that the byte at address 1132:0100 contains BO, which 
is the opcode for moving a value into register AL, and address 1132:0101 contains the 
operand (in this case 57) to be moved to AL. Therefore, the instruction "MOV AL, 57" has 
a machine code of B057, where BO is the opcode and 57 is the operand. Similarly, the 


aaaea 
34 


machine code B686 is located in memory locations 1132:0102 and 1132:0103 and repre- 
sents the opcode and the operand for the instruction "MOV DH, 86". The physical address 
is an actual location within RAM (or even ROM). The following are the physical address- 
es and the contents of each location for the program above. Remember that it is the phys- 
ical address that is put on the address bus by the 8086 CPU to be decoded by the memo- 


ry circuitry: 

LOGICAL ADDRESS PHYSICAL ADDRESS MACHINE CODE CONTENTS 
SZS OLOO 11420 BO 
LA S2R OT O 11421 S 
TISA O2 11422 B6 
220103 MA2 86 
1S2 0104 11424 B2 
W205 11425 Ve 
TSA 3006 11426 89 
LUBA 2 OLOY 11427 D1 
ULSZ AO LOE 11428 88 
MLSA OLOS, 11429 Cy 
11isZz010A 1142A B3 
idl 32 3 (0) 1 ONS" 1142B oF 
LU 8 OLOS 1142C B4 
12 OOD 1142D 20 
LSZ 3 OIO 1142E 01 
LESZ a OO 1142F DO 
LESA OT 11430 01 
LSZ SOILA 11431 D9 
Meese 2 Wl. 11432 05 
TSZ OLS 11433 35 
ESA: OLIA 11434 1F 


Data segment 


Assume that a program is being written to add 5 bytes of data, such as 25H, 12H, 
15H, 1FH, and 2BH, where each byte represents a person's daily overtime pay. One way 
to add them is as follows: 


MOY AL,00H -s;initialize AL 
ADD AL,25H ;add 25H to AL 
ADD AL,12H ;add 12H to AL 
ADD AL,15H ;add 15H to AL 
ADD AL,1FH ;add 1FH to AL 
ADD AL, 2BH ¿add 2BH to AL 


In the program above, the data and code are mixed together in the instructions. 
The problem with writing the program this way is that if the data changes, the code must 
be searched for every place the data is included, and the data retyped. For this reason, the 
idea arose to set aside an area of memory strictly for data. In x86 microprocessors, the area 
of memory set aside for data is called the data segment. Just as the code segment is asso- 
ciated with CS and IP as its segment register and offset, the data segment uses register DS 
and an offset value. 

The following demonstrates how data can be stored in the data segment and the 
program rewritten so that it can be used for any set of data. Assume that the offset for the 
data segment begins at 200H. The data is placed in memory locations: 


DS:0200 = 25 
DSR OZO = ik 
DS- 0202 1S 
DS: Q205 ie 
DS:0204 = 2B 


p ee aaa aaa 


CHAPTER 1: THE x86 MICROPROCESSOR 35 


and the program can be rewritten as follows: 


MOV AL, 0O ;clear AL 

ADD AL,[ 0200] ¿add the contents of DS:200 to AL 
ADD AL,[ 0201) ;add the contents of DS:201 to AL 
ADD  AL,[ 0202] ¿add the contents of DS:202 to AL 
ADD AL,[{ 0203] ¿aad the contents of DS:209 ICORA 
ADD AL,[ 0204] ;add the contents of DS:204 to AL 


Notice that the offset address is enclosed in brackets. The brackets indicate that 
the operand represents the address of the data and not the data itself. If the brackets were 
not included, as in "MOV AL, 0200", the CPU would attempt to move 200 into AL instead 
of the contents of offset address 200. Keep in mind that there is one important difference 
in the format of code for MASM and DEBUG in that DEBUG assumes that all numbers 
are in hex (no "H" suffix is required), whereas MASM assumes that they are in decimal 
and the "H" must be included for hex data. 

This program will run with any set of data. Changing the data has no effect on 
the code. Although this program is an improvement over the preceding one, it can be 
improved even further. If the data had to be stored at a different offset address, say 450H, 
the program would have to be rewritten. One way to solve this problem would be to use 
a register to hold the offset address, and before each ADD, to increment the register to 
access the next byte. Next a decision must be made as to which register to use. The 
8088/86 allows only the use of registers BX, SI, and DI as offset registers for the data seg- 
ment. In other words, while CS uses only the IP register as an offset, DS uses only BX, 
DI, and SI to hold the offset address of the data. The term pointer is often used for a reg- 
ister holding an offset address. In the following example, BX is used as a pointer: 


MOV AL,O initialize AL 

MOV BX,6200H ;BX points to offset addr of first byte 
ADD AL,[ BX] jadd the first byte to AL 

INC BX ;increment BX to point to the next byte 
ADD AL,[ BX] ;add the next byte to AL 

INC BX ;increment the pointer 

ADD AL,[ BX] ;add the next byte to AL 

INC BX ;increment the pointer 

ADD AL,[ BX] ;add the last byte to AL 


The INC instruction adds 1 to (increments) its operand. "INC BX" achieves the 
same result as "ADD BX,1". For the program above, if the offset address where data is 
located is changed, only one instruction will need to be modified and the rest of the pro- 
gram will be unaffected. Examining the program above shows that there is a pattern of two 
instructions being repeated. This leads to the idea of using a loop to repeat certain instruc- 
tions. Implementing a loop requires familiarity with the flag register, discussed later in 
this chapter. 


Logical address and physical address in the data segment 


The physical address for data is calculated using the same rules as for the code 
segment. That is, the physical address of data is calculated by shifting DS left one hex 
digit and adding the offset value, as shown in Examples 1-2, 1-3, and 1-4. 


Little endian convention 


Previous examples used 8-bit or 1-byte data. In this case the bytes are stored one 
after another in memory. What happens when 16-bit data is used? For example: 


MOV  -AX, 35830 ; load 35F3H into Ax 
MOV [1500] ,AX ;copy the contents of AX to offset 1500H 


In cases like this, the low byte goes to the low memory location and the high byte 


————— Leese 
36 


goes to the high memory address. In the example above, memory location DS:1500 con- 
tains F3H and memory location DS:1501 contains 35H (DS:1500 = F3 DS:1501 = 35). 
This convention is called little endian versus big endian. The origin of the terms 
big endian and little endian is from a Gulliver's Travels story about how an egg should be 
opened: from the little end or the big end. In the big endian method, the high byte goes 
to the low address, whereas in the little endian method, the high byte goes to the high 
address and the low byte to the low address. See Example 1-5. All Intel microprocessors 
and many microcontrollers use the little endian convention. Freescale (formerly Motorola) 
microprocessors, along with some other microcontrollers, use big endian. This difference 
might seem as trivial as whether to break an egg from the big end or little end, but it is a 
nuisance in converting software from one camp to be run on a computer of the other camp. 


Assume that DS is 5000 and the offset is 1950. Calculate the physical address. 
Solution: DS 


offset 


angra. gra 


The physical address will be 50000 + 1950 = 51950. 


1. Start with DS. 


2. Shift DS left. 


3. Add the offset. 


4. Physical address. 


3 
If DS = 7FA2H and the offset is 438EH, calculate (a) the physical address, (b) the lower 
range, and (c) the upper range of the data segment. Show (d) the logical address. 


Solution: 


(a) 83DAE (7FA20 + 438E) (b) 7FA20 (7FA20 + 0000) 
(c) 8FAIF (7FA20 + FFFF) (d) 7FA2:438E 


ample 1-4 


Assume that the DS register is 578C. To access a given byte of data at physical 
memory location 67F66, does the data segment cover the range where the data resides? 
If not, what changes need to be made? 


Solution: 


No, since the range is 578C0 to 678BF, location 67F66 is not included in this range. To 
access that byte, DS must be changed so that its range will include that byte. 


CHAPTER 1: THE x86 MICROPROCESSOR 37 


Example 1-5 


Assume memory locations with the following contents: DS:6826 = 48 and DS:6827 = 
22. Show the contents of register BX in the instruction “MOV BX,[6826]”. 


Solution: 
According to the little endian convention used in all x86 microprocessors, register BL 


should contain the value from the low offset address 6826 and register BH the value 
from the offset address 6827, giving BL = 48H and BH = 22H. 


DS:6826 = 48 BH BL 


DS:6827= 22 


Extra segment (ES) 


ES is a segment register used as an extra data segment. Although in many nor- 
mal programs this segment is not used, its use is absolutely essential for string operations 
and is discussed in detail in Chapter 6. 


Memory map of the IBM PC 


For a program to be executed on the 
PC, Windows must first load it into RAM. 
Where in RAM will it be loaded? To answer 
that question, we must first explain some 
very important concepts concerning memo- 
ry in the PC. The 20-bit address of the 
8088/86 allows a total of 1 megabyte 
(1024K bytes) of memory space with the 
address range 00000-FFFFF. During the 
design phase of the first IBM PC, engineers 
had to decide on the allocation of the 1- 
megabyte memory space to various sections A0000H 


00000H 


of the PC. This memory allocation is called ee 

RAM 128K 
a memory map. The memory map of the z 
IBM PC is shown in Figure 1-3. Of this 1 ee 
megabyte, 640K bytes from addresses C0000H 
00000-9FFFFH were set aside for RAM. 
The 128K bytes from A0000H to BFFFFH FFFFFH 


were allocated for video memory. The 


remaining 256K bytes from CO000H to F igure 1-3. Memory Allocation in the PC 
FFFFFH were set aside for ROM. 


More about RAM 


In the early 1980s, most PCs came with only 64K to 256K bytes of RAM mem- 
ory, which was considered more than adequate at the time. Users had to buy memory 
expansion boards to expand memory up to 640K if they needed additional memory. The 
need for expansion depends on the Windows version being used and the memory needs of 
the application software being run. The Windows operating system first allocates the 
available RAM on the PC for its own use and then lets the rest be used for applications 
such as word processors. The complicated task of managing RAM memory is left to 
Windows since the amount of memory used by Windows varies among its various ver- 
sions and since different computers have different amounts of RAM, plus the fact that the 
memory needs of application packages vary. For this reason we do not assign any values 
for the CS, DS, and SS registers since such an assignment means specifying an exact 


eee 
38 


physical address in the range 00000-9FFFFH, and this is beyond the knowledge of the 
user. Another reason is that assigning a physical address might work on a given PC but it 
might not work on a PC with a different OS version and RAM size. In other words, the 
program would not be portable to another PC. Therefore, memory management is one of 
the most important functions of the operating system and should be left to Windows. This 
is very important to remember because in many examples in this book we have values for 
the segment registers CS, DS, and SS that will be different from the values that readers 
will get on their PCs. Do not try to assign the value to the segment registers to comply 
with the values in this book. 


Video RAM 


From A0000H to BFFFFH is set aside for video. The amount used and the loca- 
tion vary depending on the video board installed on the PC. Table E-2 of Appendix E lists 
the starting addresses for video boards. 


More about ROM 


From C0000H to FFFFFH is set aside for ROM. Not all the memory space in this 
range is used by the PC's ROM. Of this 256K bytes, only the 64K bytes from location 
F0000H-FFFFFH are used by BIOS (basic input/output system) ROM. Some of the 
remaining space is used by various adapter cards (such as the network card), and the rest 
is free. In recent years, newer versions of Windows have gained some very powerful 
memory management capabilities and can put to good use all the unused memory space 
beyond 640. The 640K-byte memory space from 00000 to 9FFFFH is referred to as con- 
ventional memory, while the 384K bytes from A0000H to FFFFFH are called the UMB 
(upper memory block) in Microsoft literature. 


Function of BIOS ROM 


Since the CPU can only execute programs that are stored in memory, there must 
be some permanent (nonvolatile) memory to hold the programs telling the CPU what to 
do when the power is turned on. This collection of programs held by ROM is referred to 
as BIOS in the PC literature. BIOS, which stands for basic input-output system, contains 
programs to test RAM and other components connected to the CPU. It also contains pro- 
grams that allow Windows to communicate with peripheral devices such as the keyboard, 
video, printer, and disk. It is the function of BIOS to test all the devices connected to the 
PC when the computer is turned on and to report any errors. For example, if the keyboard 
is disconnected from the PC before the computer is turned on, BIOS will report an error 
on the screen, indicating that condition. It is only after testing and setting up the periph- 
erals that BIOS will load Windows from disk into RAM and hand over control of the PC 
to Windows. Although there are occasions when either Windows or applications programs 
need to use programs in BIOS ROM, Windows always controls the PC once it is loaded. 


Review Questions 


l. A segment is an area of memory that includes up to bytes. 

2. How large is a segment in the 8086? Can the physical address 346E0 be the starting 
address for a segment? Why or why not? 

3. State the difference between the physical and logical addresses. 

4. A physical address is a -bit address; an offset address is a -bit address. 

5. Which register is used as the offset register with segment register CS? 

6. If BX = 1234H and the instruction "MOV [2400],BX" were executed, what would be 


the contents of memory locations at offsets 2400 and 2401? 
SECTION 1.5: THE STACK 


In this section we examine the concept of the stack, its use in x86 microproces- 
sors, and its implementation in the stack segment. Then more advanced concepts relating 
to segments are discussed, such as overlapping segments. 


CHAPTER 1: THE x86 MICROPROCESSOR 39 


What is a stack, and why is It needed? 


The stack is a section of read/write memory (RAM) used by the CPU to store 
information temporarily. The CPU needs this storage area since there are only a limited 
number of registers. There must be some place for the CPU to store information safely and 
temporarily. Now one might ask, why not design a CPU with more registers? The reason 
is that in the design of the CPU, every transistor is precious and not enough of them are 
available to build hundreds of registers. In addition, how many registers should a CPU 
have to satisfy every possible program and application? All applications and programming 
techniques are not the same. In a similar manner, it would be too costly in terms of real 
estate and construction costs to build a 50-room house to hold everything one might pos- 
sibly buy throughout his or her lifetime. Instead, one builds or rents a shed for storage. 

Having looked at the advantages of having a stack, what are the disadvantages? 
The main disadvantage of the stack is its access time. Since the stack is in RAM, it takes 
much longer to access compared to the access time of registers. After all, the registers are 
inside the CPU and RAM is outside. This is the reason that some very powerful (and con- 
sequently, expensive) computers do not have a stack; the CPU has a large number of reg- 
isters to work with. 


How stacks are accessed 


If the stack is a section of RAM, there must be registers inside the CPU to point 
to it. The two main registers used to access the stack are the SS (stack segment) register 
and the SP (stack pointer) register. These registers must be loaded before any instructions 
accessing the stack are used. Every register inside the x86 (except segment registers and 
SP) can be stored in the stack and brought back into the CPU from the stack memory. The 
storing of a CPU register in the stack is called a push, and loading the contents of the stack 
into the CPU register is called a pop. In other words, a register is pushed onto the stack to 
store it and popped off the stack to retrieve it. The job of the SP is very critical when push 
and pop are performed. In the x86, the stack pointer register (SP) points at the current 
memory location used for the top of the stack and as data is pushed onto the stack it is 
decremented. It is incremented as data is popped off the stack into the CPU. When an 
instruction pushes or pops a general-purpose register, it must be the entire 16-bit register. 
In other words, one must code "PUSH Ax"; there are no instructions such as "PUSH AL" 
or "PUSH AH". The reason that the SP is decremented after the push is to make sure that 
the stack is growing downward from upper addresses to lower addresses. This is the 
opposite of the IP (instruction pointer). As was seen in the preceding section, the IP points 
to the next instruction to be executed and is incremented as each instruction is executed. 
To ensure that the code section and stack section of the program never write over each 
other, they are located at opposite ends of the RAM inemory set aside for the program and 
they grow toward each other but must not meet. If they meet, the program will crash. To 
see how the stack grows, look at the following examples. 


Pushing onto the stack 


Notice in Example 1-6 that as each PUSH is executed, the contents of the regis- 
ter are saved on the stack and SP is decremented by 2. For every byte of data saved on the 
stack, SP is decremented once, and since push is saving the contents of a 16-bit register, 
it is decremented twice. Notice also how the data is stored on the stack. In the x86, the 
lower byte is always stored in the memory location with the lower address. That is the rea- 
son that 24H, the contents of AH, is saved in the memory location with the address 1235 
and AL in location 1234. 


Popping the stack 


Popping the contents of the stack back into the x86 CPU is the opposite process 
of pushing. With every pop, the top 2 bytes of the stack are copied to the register speci- 
fied by the instruction and the stack pointer is incremented twice. Although the data actu- 
ally remains in memory, it is not accessible since the stack pointer is beyond that point. 
Example 1-7 demonstrates the POP instruction. 


eee 
40 


Example 1-6 


Assuming that SP = 1236, AX = 24B6, DI = 85C2, and DX = 5F93, show the contents of the 
stack as each of the following instructions is executed. 

PUSH TAX 

PUSH DI 

PUSH DX 


Solution: 
SS:1230 
SS:1231 
S$:1232 
$S:1233 
SS:1234 
SS:1235 


SS:1236 


Start: After After 


SP = 1236 PUSH AX PUSH DI 
SP = 1234 SP= ]232 


Example 1-7 


Assuming that the stack is as shown below, and SP = 18FA, show the contents of the stack and 
registers as each of the following instructions is executed: 
POP Gx 


BOP DX 
POP BX SS:18FA 


SS:18FB 
SS:18FC 
SS:18FD 
SS:18FE 
SS: 18FF 
SS:1900 


Solution: 


SP= 1900 
DX=2G6E JBX=K691 


Logical address vs. physical address for the stack 


What is the exact physical location of the stack? That depends on the value of the 
stack segment (SS) register and SP, the stack pointer. To compute physical addresses for 
the stack, the same principle is applied as was used for the code and data segments. We 
shift left SS and then add offset SP, the stack pointer register. See Example 1-8. 

What values are assigned to the SP and SS, and who assigns them? It is the job 
of the Windows operating system to assign the values for the SP and SS since memory 
management is the responsibility of the operating system. Before leaving the discussion 
of the stack, two points must be made. First, in the x86 literature, the top of the stack is 
the last stack location occupied. This is different from other CPUs. Second, BP is anoth- 
er register that can be used as an offset into the stack, but it has very special applications 
and is widely used to access parameters passed between Assembly language programs and 
high-level language programs such as C. 


CHAPTER 1: THE x86 MICROPROCESSOR 41 


Example 1-8 


If SS = 3500H and the SP is FFFEH, 
(a) Calculate the physical address of the stack. (b) Calculate the lower range. 
(c) Calculate the upper range of the stack segment. (d) Show the stack’s logical address. 


Solution: 
(a) 44FFE (35000 + FFFE) (b) 35000 (35000 + 0000) 
(c) 44FFF (35000 + FFFF) (d) 3500:FFFE 


A few more words about segments in the x86 


Can a single physical address belong to many different logical addresses? Yes, 
look at the case of a physical address value of 15020H. There are many possible logical 
addresses that represent this single physical address: 


Logical address (hex) Physical address (hex) 


1000:5020 w020 
150010020 TS020 
1502:0000 15020 
1400:1020 15020 
P2027 2000 15020 


This shows the dynamic behavior of the segment and offset concept in the 8086 
CPU. One last point that must be clarified is the case when adding the offset to the shift- 
ed segment register results in an address beyond the maximum allowed range of FFFFFH. 
In that situation, wrap-around will occur. This is shown in Example 1-9. 


Example 1-9 
What is the range of physical addresses if CS = FF59? 


Solution: orsgr — 


The low range is FF590 (FF590 + 0000). 

The range goes to FFFFF and wraps around, 
from 00000 to OFS8F (FF590 + FFFF = 0F58F), 
as shown in the illustration. 


FF590 


Overlapping 


In calculating the physical address, it is possible that two segments can overlap, 
which is desirable in some circumstances. Figure 1-4 illustrates overlapping and nonover- 
lapping segments. 


Review Questions 


1. Which registers are used to access the stack? 


2. With each PUSH instruction, the stack pointer register SP is (circle one) increment- 
ed/decremented by 2. 


3. With each POP instruction, SP is (circle one) incremented/decremented by 2. 
4. List three possible logical addresses corresponding to physical address 143F0. 


42 


Nonoverlapping Overlapping 
Segments Segments 


CS = 2500 


CS = 3000 


DS = 4050 
DS = 6321 


SS = 5000 


SS = 8210 


Figure 1-4. Nonoverlapping vs. Overlapping Segments 


SECTION 1.6: FLAG REGISTER 


In this section we describe the flag register. Many Assembly language instructions 
alter bits of the flag register and some instructions will function differently based on the 
information in the flag register. After describing the bits of the flag register and their func- 
tion, programming examples are given to demonstrate the use of the flag register. 

The flag register is a 16-bit register sometimes referred to as the status register. 
Although the register is 16 bits wide, only some of the bits are used. The rest are either 
undefined or reserved by Intel. Six of the flags are called conditional flags, meaning that 
they indicate some condition that resulted after an instruction was executed. These six are 
CF, PF, AF, ZF, SF, and OF. The three remaining flags are sometimes called control flags 
since they are used to control the operation of instructions before they are executed. A 
diagram of the flag register is shown in Figure 1-5. 


bees io 9 s 7 SA a 
[R{R[R][R[OF|PF{ T {F [SFZ] U [AF] U [PF] U [CF] 


R = reserved SF = sign flag 


U = undefined ZF = zero flag 

OF = overflow flag AF = auxiliary carry flag 
DF = direction flag PF = parity flag 

IF = interrupt flag CF = carry flag 

TF = trap flag 


Figure 1-5. Flag Register 
(Reprinted by permission of Intel Corporation, Copyright Intel Corp. 1989) 


Enel 


CHAPTER 1: THE x86 MICROPROCESSOR 43 


Bits of the flag register 


Below are listed the bits of the flag register that are used in x86 Assembly lan- 
guage programming. A brief explanation of each bit is given. How these flag bits are used 
will be seen in programming examples throughout the textbook. 
CF, the Carry Flag. This flag is set whenever there is a carry out, either from d7 after 

an 8-bit operation, or from d15 after a 16-bit data operation. 

PF, the Parity Flag. After certain operations, the parity of the result's low-order byte is 
checked. If the byte has an even number of Is, the parity flag is set to 1; otherwise, it 
is cleared. 

AF, Auxiliary Carry Flag. If there is a carry from d3 to d4 of an operation, this bit is 
set; otherwise, it is cleared (set equal to zero). This flag is used by the instructions 
that perform BCD (binary coded decimal) arithmetic. 

ZF, the Zero Flag. The zero flag is set to 1 if the result of an arithmetic or logical opera- 
tion is zero; otherwise, it is cleared. 

SF, the Sign Flag. Binary representation of signed numbers uses the most significant bit 
as the sign bit. After arithmetic or logic operations, the status of this sign bit is copied 
into the SF, thereby indicating the sign of the result. 

TF, the Trap Flag. When this flag is set it allows the program to single-step, meaning to 
execute one instruction at a time. Single-stepping is used for debugging purposes. 

IF, Interrupt Enable Flag. This bit is set or cleared to enable or disable only the exter- 
nal maskable interrupt requests. 

DF, the Direction Flag. This bit is used to control the direction of string operations, 
which are described in Chapter 6. 

OF, the Overflow Flag. This flag is set whenever the result of a signed number opera- 
tion is too large, causing the high-order bit to overflow into the sign bit. In general, 
the carry flag is used to detect errors in unsigned arithmetic operations. The overflow 
flag is only used to detect errors in signed arithmetic operations. 


Example 1-10 


Show how the flag register is affected by the addition of 38H and 2FH. 
Solution: 

MOV BH,38H ;BH= 38H 

ADD BH,2FH ;add 2F to BH, now BH=67H 


0011 1000 

0010 = I111 

0110 0111 
CF = 0 since there is no carry beyond d7 ZF = 0 since the result is not zero 
AF = | since there is a carry from d3 to d4 SF = 0 since d7 of the result is zero 
PF = 0 since there is an odd number of 1s in the result 


Flag register and ADD instruction 


In this section we examine the impact of the ADD instruction on the flag register 
as an example of the use of the flag bits. The flag bits affected by the ADD instruction are 
CF (carry flag), PF (parity flag), AF (auxiliary carry flag), ZF (zero flag), SF (sign flag), 
and OF (overflow flag). The overflow flag will be covered in Chapter 6, since it relates 
only to signed number arithmetic. To understand how each of these flag bits is affected, 
look at Examples 1-10 and 1-11. 

The same concepts apply for 16-bit addition, as shown in Examples 1-12 and 
1-13. It is important to notice the differences between 8-bit and 16-bit operations in terms 
of their impact on the flag bits. The parity bit only counts the lower 8 bits of the result and 
is set accordingly. Also notice the CF bit. The carry flag is set if there is a carry beyond 
bit d15 instead of bit d7. 

Notice the zero flag (ZF) status after the execution of the ADD instruction. Since 
the result of the entire 16-bit operation is zero (meaning the contents of BX), ZF is set to 


44 


Show how the flag register is affected by 
MOV AL, 9CH ;AL=9CH 
MOV DH, 64H ; DH=64H 
ADD AL, DH now AL=0 


Solution: 
1001 1100 
0110 0100 
0000 0000 
CF = 1 since there is a carry beyond d7 ZF = 1 since the result is zero 
AF = | since there is a carry from d3 to d4 SF = 0 since d7 of the result is zero 
PF = | since there is an even number of 1s in the result 


Show how the flag register is affected by 

MOV AX, 34F5H ;AX= 34F5H 

ADD AX, 95EBH ;now AX= CAEOH 
Solution: 

34F5 0011 0100 1111 0101 
= 95EB 1001 O101 1110 1011 

CAEO 1100 1010 1110 0000 
CF = 0 since there is no carry beyond d15 ZF = 0 since the result is not zero 
AF = 1 since there is a carry from d3 to d4 SF = 1 since d15 of the result is one 
PF = 0 since there is an odd number of Is in the lower byte 


Show how the flag register is affected by 
MOV BX, AAAAH ;BX= AAAAH 
ADD BX, 9966A ;now BX= 0000H 
Solution: 
AAAA 1010 1010 1010 1010 
+ 5536 0101 0101 O101 110 
0000 0000 0000 0000 0000 
CF = 1 since there is a carry beyond d15 ZF = | since the result is zero 
AF = | since there is a carry from d3 to d4 SF = 0 since d15 of the result is zero 
PF = | since there is an even number of 1s in the lower byte 


Example 1-14 


Show how the flag register is affected by 
MOV AX, 94C2H ;AX=94C2H 
MOV BX,323EH ; BX=323EH 
ADD AX,BX ;now AX=C700H 
MOV DX,AX ;now DX=C700H 
MOV ex, Dx ¿now CX=C700H 
Solution: 
94C2 1001 0100 1100 0010 
r 323E 0011 0010 0011 1li0 
C700 1100 0111 0000 0000 
After the ADD operation, the following are the flag bits: 
CF = 0 since there is no carry beyond d15 ZF = 0 since the result is not zero 
AF = 1 since there is a carry from d3 to d4 SF = 1 since d15 of the result is 1 
PF = 1 since there is an even number of 1s in the lower byte 


CHAPTER 1: THE x86 MICROPROCESSOR 45 


high. Do all instructions affect the flag bits? The answer is no; some instructions such as 
data transfers (MOV) affect no flags. As an exercise, run these examples on DEBUG to 
see the effect of various instructions on the flag register. 

Running the instructions in Example 1-14 in DEBUG will verify that MOV 
instructions have no effect on the flag. How these flag bits are used in programming is dis- 
cussed in future chapters in the context of many applications. In Appendix B we give addi- 
tional information about the effect of various instructions on the flags. 


Use of the zero flag for looping 


One of the most widely used applications of the flag register is the use of the zero 
flag to implement program loops. The term Joop refers to a set of instructions that is 
repeated a number of times. For example, to add 5 bytes of data, a counter can be used to 
keep track of how many times the loop needs to be repeated. Each time the addition is per- 
formed the counter is decremented and the zero flag is checked. When the counter 
becomes zero, the zero flag is set (ZF = 1) and the loop is stopped. The following shows 
the implementation of the looping concept in the program, which adds 5 bytes of data. 
Register CX is used to hold the counter and BX is the offset pointer (SI or DI could have 
been used instead). AL is initialized before the start of the loop. In each iteration, ZF is 
checked by the JNZ instruction. JNZ stands for "Jump Not Zero" meaning that if ZF = 0, 
Jump to a new address. If ZF = 1, the jump is not performed and the instruction below the 
jump will be executed. Notice that the JNZ instruction must come immediately after the 
instruction that decrements CX since JNZ needs to check the effect of "DEC CX" on ZF. 
If any instruction were placed between them, that instruction might affect the zero flag. 


MOM C705 7;CX holds the loop count 
MOV "BX, 0200H ;BX holds the offset data address 
MOV AL, OO initialize AL 
ADD LPE: ADD Ai,[ BX] ;add the next byte to AL 
INC BX jincrement the data pointer 
DEC TCX ;decrement the loop counter 
JNZ ADD LP ;jump to next iteration if counter not zero 


Review Questions 


l. The ADD instruction can affect which bits of the flag register? 
2. The carry flag will be set to 1 in an 8-bit ADD if there is a carry out from bit _. 
3. CF will be set to 1 in a 16-bit ADD if there is a carry out from bit 


SECTION 1.7: x86 ADDRESSING MODES 


The CPU can access operands (data) in various ways, called addressing modes. 
The number of addressing modes is determined when the microprocessor is designed and 
cannot be changed. The x86 provides a total of seven distinct addressing modes: 
register 
immediate 
direct 
register indirect 
based relative 
indexed relative 
. based indexed relative 
Each addressing mode is explained below, and application examples are given in 
later chapters. ADD and MOV instructions are used below to explain addressing modes. 


NAWARWNS 


Register addressing mode 


The register addressing mode involves the use of registers to hold the data to be 
manipulated. Memory is not accessed when this addressing mode is executed; therefore, 
it is relatively fast. Examples of register addressing mode follow: 

MOV BX,DX ;€Opy the contents geeieeanto BX 

MOV ES,AX ;copy the contents @e AX into Es 


eee 
46 


ADD AL,BH ;add the contents of BH to contents of AL 


It should be noted that the source and destination registers must match in size. In 
other words, coding "MOV CL, Ax" will give an error, since the source is a 16-bit register 
and the destination is an 8-bit register. 


Immediate addressing mode 


In the immediate addressing mode, the source operand is a constant. In immedi- 
ate addressing mode, as the name implies, when the instruction is assembled, the operand 
comes immediately after the opcode. For this reason, this addressing mode executes 
quickly. However, in programming it has limited use. Immediate addressing mode can be 
used to load information into any of the registers except the segment registers and flag 
registers. Examples: 


MOV AX,2550H ANOVGe SOOhm into AX 
MOV CX, 625 ;load the decimal value 625 into CX 
MOV BL, 40H ;load 40H into BL 


To move information to the segment registers, the data must first be moved to a 
general-purpose register and then to the segment register. Example: 
MOV AX,2550H 


MOV DS,AX 
MOV DS,0123H ;illegal! cannot move data into segment reg. 


In the first two addressing modes, the operands are either inside the microproces- 
sor or tagged along with the instruction. In most programs, the data to be processed is 
often in some memory location outside the CPU. There are many ways of accessing the 
data in the data segment. The following describes those different methods. 


Direct addressing mode 


In the direct addressing mode the data is in some memory location(s) and the 
address of the data in memory comes immediately after the instruction. Note that in imme- 
diate addressing, the operand itself is provided with the instruction, whereas in direct 
addressing mode, the address of the operand is provided with the instruction. This address 
is the offset address and one can calculate the physical address by shifting left the DS reg- 
ister and adding it to the offset as follows: 


MOV DL,[ 2400] move contents of DS:2400H into DL 


In this case the physical address is calculated by combining the contents of offset 
location 2400 with DS, the data segment register. Notice the bracket around the address. 
In the absence of this bracket executing the command will give an error since it is inter- 
preted to move the value 2400 (16-bit data) into register DL, an 8-bit register. See 
Example 1-15. 


Example 1-15 


Find the physical address of the memory location and its contents after the execution of the fol- 
lowing, assuming that DS = 1512H. 

MOV AL, 99H 

MOV [*SS28), AL 


Solution: 


First AL is initialized to 99H, then in line two, the contents of AL are moved to logical address 
DS:3518, which is 1512:3518. Shifting DS left and adding it to the offset gives the physical 
address of 18638H (15120H + 3518H = 18638H). That means after the execution of the second 
instruction, the memory location with address 18638H will contain the value 99H. 


CHAPTER 1: THE x86 MICROPROCESSOR 47 


Register indirect addressing mode 


In the register indirect addressing mode, the address of the memory location 
where the operand resides is held by a register. The registers used for this purpose are SI, 
DI, and BX. If these three registers are used as pointers, that is, if they hold the offset of 
the memory location, they must be combined with DS in order to generate the 20-bit phys- 
ical address. For example: 


MOV AL,[{ BX] ;moves into AL the contents of the memory 
 Locaticonpoineced to by IDIS) 5 EN. 


Notice that BX is in brackets. In the absence of brackets, the code is interpreted 
as an instruction moving the contents of register BX to AL (which gives an error because 
source and destination do not match) instead of the contents of the memory location 
whose offset address is in BX. The physical address is calculated by shifting DS left one 
hex position and adding BX to it. The same rules apply when using register SI or DI. 


MOW ay CH (Sts) ‘move contents Of IDS S Tmtome il 
MOV [DI] , AH ;move contents of AH into DS:DI 


The examples above moved byte-sized data. Example 1-16 shows 16-bit data. 


Example 1-16 


Assume that DS = 1120, SI = 2498, and AX = 17FE. Show the contents of memory locations 
after the execution of "MOV [ SI] , AX". 


Solution: 


The contents of AX are moved into memory locations with logical address DS:SI and DS:SI + 
1; therefore, the physical address starts at DS (shifted left) + SI = 13698. According to the little 
endian convention, low address 13698H contains FE, the low byte, and high address 13699H 
will contain 17, the high byte. 


Based relative addressing mode 


In the based relative addressing mode, base registers BX and BP, as well as a dis- 
placement value, are used to calculate what is called the effective address. The default seg- 
ments used for the calculation of the physical address (PA) are DS for BX and SS for BP. 
For example: 


MOV CX,[ BX] +10 imove DS:BxX+10 and DS:BxX+10+1 into Cx 
FPA = DS (shifted left) + BX + 10 


Alternative codings are "MOV CX,{ BX+10]" or "MOV CX, 10[ BX] ". Again the 
low address contents will go into CL and the high address contents into CH. In the case 
of the BP register, 


MOV Alt, || BERES ;PA = SS (shifted left) + BP + 5 


Again, alternative codings are "MOV AL,{ BP+5]" or "MOV AL, 5[ BP] ". A brief 
mention should be made of the terminology effective address used in Intel literature. In 
"MOV AL,[ BP] +5", BP+5 is called the effective address since the fifth byte from the 
beginning of the offset BP is moved to register AL. Similarly in "MOV cx,[ BX] +10", 
BX+10 is called the effective address. 


Indexed relative addressing mode 


The indexed relative addressing mode works the same as the based relative 
addressing mode, except that registers DI and SI hold the offset address. Examples: 


48 


MOV DPA SIE) S25) ARAR TDS shifted Iaket) + SI 4 5 
MOV CL,[ DI] +20 72 eS (shafted left) + DI + 20 


Example 1-17 gives further examples of indexed relative addressing mode. 


Assume that DS = 4500, SS = 2000, BX = 2100, SI = 1486, DI = 8500, BP = 7814, and AX = 
2512. All values are in hex. Show the exact physical memory location where AX is stored in 
each of the following. All values are in hex. 

(a) MOVI BX] +20,AX (b) MOV, SI] +10,AX 

(c) MOVI DI] +4,AX (d) MOVI BP] +12, AX 


Solution: 

In each case PA = segment register (shifted left) + offset register + displacement. 
(a) DS:BX+20 location 47120 = (12) and 47121 = (25) 

(b) DS:SI+10 location 46496 = (12) and 46497 = (25) 

(c) DS:DI+4 location 4D504 = (12) and 4D505 = (25) 

(d) SS:BP+12 location 27826 = (12) and 27827 = (25) 


Based indexed addressing mode 


By combining based and indexed addressing modes, a new addressing mode is 
derived called the based indexed addressing mode. In this mode, one base register and 
one index register are used. Examples: 


MOV CL,[{ BX][ DI] +8: ;PA = DS (shifted left) + BX + DI + 8 
MOV CH,[{ BX][ SI] +20 ;PA = DS (shifted left) + BX + SI + 20 
MOV AH,[{ BP][ DI] +12 ;PA = SS (shifted left) + BP + DI + 12 
MOV AH,[ BP][ SI] +29 ;PA = SS (shifted left) + BP + SI + 29 


The coding of the instructions above can vary; for example, the last example 
could have been written in either of the following two ways: 


MOV AH,[ BP+S1I+29] 
MOV AH,[ SI+BP+29] ;the register order does not matter 
Note that "MOV AX,[ SI] [ DI] +displacement" is illegal. 


Table 1-5: Offset Registers for Various Segments 
Segment register: CS DS ES SS 
Offset register(s): IP SI, DI, BX SI, DI, BX SP, BP 


In many of the examples above, the MOV instruction was used for the sake of 
clarity, even though one can use any instruction as long as that instruction supports the 
addressing mode. For example, the instruction "ADD DL,[{ BX] " would add the contents 
of the memory location pointed at by DS:BX to the contents of register DL. 


Segment overrides 


Table 1-5 summarizes the offset registers that can be used with the four segment 
registers. The x86 CPU allows the program to override the default segment and use any 
segment register. To do that, specify the segment in the code. For example, in "MOV 
AL,[ Bx] ", the physical address of the operand to be moved into AL is DS:BX, as was 
shown earlier since DS is the default segment for pointer BX. To override that default, 
specify the desired segment in the instruction as "MOV AL, ES:[ BX] ". Now the address 
of the operand being moved to AL is ES:BX instead of DS:BX. Extensive use of all these 


CHAPTER 1: THE x86 MICROPROCESSOR 49 


addressing modes is shown in future chapters in the context of program examples. 

Table 1-6 shows more examples of segment overrides shown next to the default 
address in the absence of the override. Table 1-7 summarizes addressing modes of the 
8088/86. 


MOV AX, CS:[BP] CSBP SS:BP 
MOV DX,SS:[S]] SS:SI DS:SI 
MOV AX,DS:[BP] DS:BP SS:BP 


MOV CX,ES:[BX]+12 ES:BX+12 DS:Bx+12 


MOV SS:[BX][DI]+32,AX SS:BX+DI+32 DS:BX+DI+32 


Table 1-7: Summary of the x86 Addressing Modes - 


Addressing Mode Operand Default Segment 
Register reg none 


Immediate data none 
Direct offset DS 
Register indirect [BX] DS 
[SI] DS 
DI DS 
Based relative [BX]+disp DS 
BP]+dis SS 
Indexed relative [DI]+disp DS 
SI}+dis SS 
Based indexed relative [BX][SI]+disp DS 
[BX][DI]+disp DS 
[BP][SI]+disp SS 


[BP][DI]+disp SS 


Review Questions 


1. Can the x86 programmer make up new addressing modes? 
2. Is the IP (instruction pointer) register also available in low-byte and high-byte for- 
mats? 


3. Is the CS (code segment) register also available in low-byte and high-byte formats? 
4. Which segment is used for the direct addressing mode? 

5. Which registers can be used for the register indirect addressing mode? 
PROBLEMS 


SECTION 1.1: BRIEF HISTORY OF THE x86 FAMILY 


1. Which microprocessor, the 8088 or the 8086, was released first? 

2. Ifthe 80286 and 80386SX both have 16-bit external data buses, what is the difference 
between them? 

3. What does "16-bit" or "32-bit" microprocessor mean? Does it refer to the internal or 
external data path? 

4. Do programs written for the 8088/86 run on 80286-, 80386-, and 80486-based CPUs? 

5. What does the term upward compatibility mean? 


——$— eee 
50 


6. Name a major difference between the 8088 and the 8086. 
7. Which has the larger queue, the 8088 or the 8086? 


SECTION 1.2: INSIDE THE 8088/86 


8. State another way to increase the processing power of the CPU other than increasing 
the frequency. 

9. What do "BIU" and "EU" stand for, and what are their functions? 

10. Name the general-purpose registers of the 8088/86. 


(a) 8-bit (b) 16-bit 
11. Which of the following registers cannot be split into high and low bytes? 
(a) CS (b) AX (c) DS 
(d) SS (e) BX (f) DX 
(g) CX (h) SI (i) DI 


SECTION 1.3: INTRODUCTION TO ASSEMBLY PROGRAMMING 


12. Which of the following instructions cannot be coded in 8088/86 Assembly language? 
Give the reason why not, if any. To verify your answer, code each in DEBUG. 
Assume that all numbers are in hex. 


(a) MOV AX,27 (b) MOV AL,97F (c) MOV DS,9BF2 
(d) MOV CX,397 (e) MOV SI,9516 (f) MOV CS,3490 
(g) MOV DS,BX (h) MOV BX,CS (i) MOV CH,AX 
(j) MOV AX,23FB9 (k) MOV CS,BH (1) MOV AX,DL 


SECTION 1.4: INTRODUCTION TO PROGRAM SEGMENTS 


13. Name the segment registers and their functions in the 8088/86. 
14. If CS = 3499H and IP = 2500H, find: 
(a) The logical address (b) The physical address 
(c) The lower and upper ranges of the code segment 

15. Repeat Problem 14 with CS = 1296H and IP = 100H. 

16. If DS = 3499H and the offset = 3FB9H, find: 

(a) The physical address (b) The logical address of the data being fetched 
(c) The lower and upper range addresses of the data segment 

17. Repeat Problem 16 using DS = 1298H and the offset = 7CC8H. 

18. Assume that the physical address for a location is 0046CH. Suggest a possible logi- 
cal address. 

19. If an instruction that needs to be fetched is in physical memory location 389F2 and 
CS = 2700, does the code segment range include it or not? If not, what value should 
be assigned to CS if the IP must equal 1282? 

20. Using DEBUG, assemble and unassemble the following program and provide the log- 
ical address, physical address, and the content of each address location. The CS value 
is decided by Windows, but use IP = 170H. 

MOV AL, 76H 
MOV BH, 8FH 
ADD BH,AL 
ADD BH, 7BH 
MOV BL,BH 
ADD BL,AL 

21. Repeat Problem 20 for the following program from page 36. 
MOV AL, 0 ;clear AL 
ADD AL,[ 0200] sadd the contents of DS:200 to AL 
ADD AL,[ 0201] jadd the contents of DS:201 to AL 
ADD AL,[ 0202] ;add the contents of DS:202 to AL 
ADD AL,[ 0203] raddi the contents of DS:203 to AL 
ADD AL,[ 0204] padd the “contents of DS:2704 to Al 


CHAPTER 1: THE x86 MICROPROCESSOR 31 


SECTION 1.5: THE STACK 


22. The stack is: 
(a) A section of ROM 
(b) A section of RAM used for temporary storage 
(c) A 16-bit register inside the CPU 
(d) Some memory inside the CPU 

23. In problem 22, choose the correct answer for the stack pointer. 

24. When data is pushed onto the stack, the stack pointer is , but when 
data is popped off the stack, the stack pointer is : 

25. Choose the correct statement: 
(a) The stack segment and code segment start at the same point of read/write memo- 
ry and grow upward. 
(b) The stack segment and code segment start at opposite points of read/write memo- 
ry and grow toward each other. 
(c) There will be no problem if the stack and code segments meet each other. 

26. What is the main disadvantage of the stack as temporary storage compared to having” 
a large number of registers inside the CPU? 

27. If SS = 2000 and SP = 4578, find: 
(a) The physical address 
(b) The logical address 
(c) The lower range of the stack segment 
(d) The upper range of the stack segment 

28. If SP = 24FC, what is the offset address of the first location of the stack that is avail- 
able to push data into? 

29. Assume that SP = FF2EH, AX = 3291H, BX = F43CH, and CX = 09. Find the con- 
tents of the stack and SP after the execution of each of the following instructions. 


USH AX 
PUSH BX 
BUST im ex 


30. Show the sequence of instructions needed to restore all the registers to their original 
values in Problem 29. Show the content of SP at each point. 
. The following registers are used as offsets. Assuming that the default segment is used 
to get the logical address, give the segment register associated with each offset. 
(a) BP (b) DI (c) IP 
(d) SI (e) SP (f) BX 
32. Show the override segment register and the default segment register used (if there 
were no override) in each of the following cases. 
(a) MOV SS:[BX],AX (b) MOV SS:[DI],BX 
(c) MOV DX,DS:[BP+6] 


3 


N 


SECTION 1.6: FLAG REGISTER 


33. Find the status of the CF, PF, AF, ZF, and SF for the following operations. 
(a)MOV BL,9FH (b) MOV AL,23H (c) MOV DX,10FFH 
ADD BL,61H ADD AL,97H ADD DX,1 


SECTION 1.7: x86 ADDRESSING MODES 


34. Assume that the registers have the following values (all in hex) and that CS = 1000, 
DS = 2000, SS = 3000, SI = 4000, DI = 5000, BX = 6080, BP = 7000, AX = 25EE, 
CX = 8791, and DX = 1299. Calculate the physical address of the memory where the 
operand is stored and the contents of the memory locations in each of the following 
addressing examples. 


(a) MOV [SI] ,AL (b) MOV [SI+BX+8],AH 
(c) MOV [BX],AX (d) MOV [DI+6],BX 
(e) MOV [DI][BX]+28,CX (f) MOV [BP][SI]+10,DX 


ee 
52 


(g) MOV [3600],AX (h) MOV [BX]+30,DX 


(i) MOV [BP]+200,AX (j) MOV [BP+SI+100],BX 
(k) MOV [SI]+50,AH (D MOV [DI+BP+100],AX. 
35. Give the addressing mode for each of the following: 
(a) MOV AX,DS (b) MOV BX,5678H 
(c) MOV CX,[3000] (d) MOV AL,CH 
(e) MOV [DI],BX (f) MOV AL, [BX] 
(g) MOV DX,[BP+DI+4] (h) MOV CX,DS 
(i) MOV [BP+6],AL 0) MOV AH,[BX+SI+50] 
(k) MOV BL,[SI}+10 (1) MOV [BP][SI]+12,AX 
36. Show the contents of the memory locations after the execution of each instruction. 
(a) MOV BX,129FH (b) MOV DX,8C63H 
MOV [1450],BX MOV [2348],DX 
DS:1450 .... DS:2348 .... 
DS:1451 .... DS:2349 .... 


ANSWERS TO REVIEW QUESTIONS 


SECTION 1.1: BRIEF HISTORY OF THE x86 FAMILY 


1. (1) increased memory capacity from 64K to 1 megabyte; 
(2) the 8086 is a 16-bit microprocessor instead of an 8-bit microprocessor; 
(3) the 8086 was a pipelined processor 

2. The 8088 has an 8-bit external data bus whereas the 8086 has a 16-bit data bus. 

3. (a) 20-bit, 1 megabyte (b) 24-bit, 16 megabytes (c) 32-bit, 4 gigabytes 

4. 16, 32 

5. The 80386 has 32-bit address and data buses, whereas the 80386SX has a 24-bit 
address bus and a 16-bit external data bus. 

6. Virtual memory, protected mode 

7. Math coprocessor on the CPU chip, cache memory and controller 

8. A 64-bit bus, faster floating point, and separate cache memory for code and data. 

9. 7.5 million 

10. Pentium IT 

11. MMX 

12. 8-bit, 16-bit, and 32-bit 

13. 8-bit, 16-bit, 32-bit, and 64-bit 

14. True 


SECTION 1.2: INSIDE THE 8088/86 


1. The execution unit executes instructions; the bus interface unit fetches instructions. 
2. Pipelining divides the microprocessor into two sections: the execution unit and the 
bus interface unit; this allows the CPU to perform these two functions simultaneous- 
ly; that is, the BIU can fetch instructions while the EU executes the instructions pre- 
viously fetched. 

8, 16 

4. AX, BX, CX, DX, SP, BP, SI, DI, CS, DS, SS, ES, IP, FR 


Ux 


SECTION 1.3: INTRODUCTION TO ASSEMBLY PROGRAMMING 


1. MOV BX,1234H 
2. MOV AX,16H 
' ADD AX,ABH 
3. The segment registers CS, DS, ES, and SS 
4. FFFFH = 6553510, FFH = 25510 


CHAPTER 1: THE x86 MICROPROCESSOR 53 


SECTION 1.4: INTRODUCTION TO PROGRAM SEGMENTS 


L. 
3. 
4. 


5. 
6. 


64K ' 

A segment contains 64K bytes; yes because 346E0H is evenly divisible by 16. 

The physical address is the 20-bit address that is put on the address bus to locate a 
byte; the logical address is the address in the form xxxx:yyyy, where xxxx is the seg- 
ment address and yyyy is the offset into the segment. 

20, 16 

IP 

2400 would contain 34 and 2401 would contain 12. 


SECTION 1.5: THE STACK 


SE 


SS is the segment register; SP and BP are used as pointers into the stack. 
Decremented 

Incremented 

143F:0000, 1000:43F0, 1410:02F0 


SECTION 1.6: FLAG REGISTER 


il; 
2: 
3: 


CF, PF, AF, ZF, SF, and OF 
y 
15 


SECTION 1.7: x86 ADDRESSING MODES 


ae. 


54 


No 

No 

No 

DS and ES 
BX, SI, and DI 


CHAPTER 2 


ASSEMBLY LANGUAGE 
PROGRAMMING 


OBJECTIVES 
Upon completion of this chapter, you will be able to: 


>> Explain the difference between Assembly language instructions and 
pseudo-instructions 

>> Identify the segments of am Assembly language program 

>> Code simple Assembly language instructions 

>> Assemble, link, and run a simple Assembly language program 

>> Code control transfer instructions such as conditional and 
unconditional jumps and call instructions 

>> Code Assembly language data directives for binary, hex, decimal, or 
ASCII data 


>> Write an Assembly language program using either the simplified 
segment definition or the full segment definition 
>> Explore the use of the MASM and emu8086 assemblers 


This chapter is an introduction to Assembly language programming for the x86 
microprocessors. In Section 2.1, the basic form of a program is explained. In Section 2.2 
the steps required to edit, assemble, link, and run a program are explored. Further exam- 
ples of x86 Assembly language programming are given in Section 2.3. Next, control trans- 
fer instructions such as jump and call are discussed in Section 2.4. The data types and data 
directives in x86 Assembly language are explained in Section 2.5. Then the full segment 
definition is discussed in Section 2.6 followed by the discussion of a popular assembler 
called emu8086. Finally, flowchart and pseudocode are explained in Section 2.7. 


SECTION 2.1: DIRECTIVES AND A SAMPLE PROGRAM 


In this section we explain the components of a simple Assembly language pro- 
gram to be assembled by the assembler. A given Assembly language program (see Figure 
2-1) is a series of statements, or lines, which are either Assembly language instructions 
such as ADD and MOV, or statements called directives. Directives (also called pseudo- 
instructions) give directions to the assembler about how it should translate the Assembly 
language instructions into machine code. An Assembly language instruction consists of 
four fields: 


[label:] mnemonic [operands] [;comment!] 
Brackets indicate that the field is optional. Do not type in the brackets. 


1. The label field allows the program to refer to a line of code by name. The label field 
cannot exceed 31 characters. Labels for directives do not need to end with a colon. A 
label must end with a colon when it refers to an opcode generating instruction; the 
colon indicates to the assembler that this refers to code within this code segment. 
Appendix C, Section 2 gives more information about labels. 


2,3. The Assembly language mnemonic (instruction) and operand(s) fields together per- 
form the real work of the program and accomplish the tasks for which the program 
was written. In Assembly language statements such as 


ADD AL, BL 
MOV AX, 6764 


ADD and MOV are the mnemonic opcodes, and "AL,BL" and "AX,6764" are the 
operands. Instead of a mnemonic and operand, these two fields could contain assembler 
pseudo-instructions, or directives. They are used by the assembler to organize the program 
as well as other output files. Directives do not generate any machine code and are used 
only by the assembler as opposed to instructions, which are translated into machine code 
for the CPU to execute. In Figure 2-1 the commands DB, END, and ENDP are examples 
of directives. 


4. The comment field begins with a ";". Comments may be at the end of a line or on a 
line by themselves. The assembler ignores comments, but they are indispensable to 
programmers. Comments are optional, but are highly recommended to make it easier 
for someone to read and understand the program. 


Model definition 


The first statement in Figure 2-1 after the first two comments is the MODEL 
directive. This directive selects the size of the memory model. Among the options for the 
memory model are SMALL, MEDIUM, COMPACT, and LARGE. 


-MODEL SMALL ;this directive defines the model as small 
SMALL is one of the most widely used memory models for Assembly language 
programs and is sufficient for the programs in this book. The small model uses a maxi- 


eee 
56 


mum of 64K bytes of memory for code and another 64K bytes for data. The other models 
are defined as follows: 


-MODEL MEDIUM ;the data must fit into 64K bytes 

but the code can exceed 64K bytes of memory 
-MODEL COMPACT ;the data can exceed 64K bytes 

but the code cannot exceed 64K bytes 
-MODEL LARGE ;both data and code can exceed 64K 

;but no single set of data should exceed 64K 
-MODEL HUGE ;both code and data can exceed 64K 

;data items (such as arrays) can exceed 64K 
-MODEL TINY ;used with COM files in which data and code 


PWS Et Into 64k bytes 


Notice in the above list that MEDIUM and COMPACT are opposites. Also note 
that the TINY model cannot be used with the simplified segment definition described in 
this section. 


Segment definition 


As mentioned in Chapter 1, the x86 CPU has four segment registers: CS (code 
segment), DS (data segment), SS (stack segment), and ES (extra segment). Every line of 
an Assembly language program must correspond to one of these segments. The simplified 
segment definition format uses three simple directives: ".CODE", ".DATA", and 
". STACK", which correspond to the CS, DS, and SS registers, respectively. There is anoth- 
er segment definition style called the full segment definition, described in Section 2.6. 


Segments of a program 
Although one can write an Assembly language program that uses only one seg- 


ment, normally a program consists of at least three segments: the stack segment, the data 
segment, and the code segment. 


. STACK ;marks the beginning of the stack segment 
. DATA ;marks the beginning of the data segment 
- CODE ;marks the beginning of the code segment 


Assembly language statements are grouped into segments in order to be recog- 
nized by the assembler and consequently by the CPU. The stack segment defines storage 
for the stack, the data segment defines the data that the program will use, and the code 
segment contains the Assembly language instructions. In Chapter 1 we gave an overview 
of how these segments were stored in memory. In the following pages we describe the 
stack, data, and code segments as they are defined in Assembly language programming. 


Stack segment 


The following directive reserves 64 bytes of memory for the stack: 


. STACK ~ 64 
Data segment 


The data segment in the program of Figure 2-1 defines three data items: DATA], 
DATA2, and SUM. Each is defined as DB (define byte). The DB directive is used by the 
assembler to allocate memory in byte-sized chunks. Memory can be allocated in different 
sizes, such as 2 bytes, which has the directive DW (define word). More of these pseudo- 
instructions are discussed in detail in Section 2.5. The data items defined in the data seg- 
ment will be accessed in the code segment by their labels. DATA1 and DATA2 are given 
initial values in the data section. SUM is not given an initial value, but storage is set aside 
for it. 


CHAPTER 2: ASSEMBLY LANGUAGE PROGRAMMING S7 


; THE FORM OF AN ASSEMBLY LANGUAGE PROGRAM 
NOTE: USING SIMPLIFIED SEGMENT DEFINITION 
-MODEL SMALL 
.STACK 64 
.DATA 
DB 52H 
DB 29H 
DB z 
-CODE 
PROC FAR ;this is the program entry point 
MOV AX, @DATA ;load the data segment address 
MOV DS, AX ;assign value to DS 
MOV AL, DATA] get the first operand 
MOV BL, DATA2 ;get the second operand 
ADD AL, BL ;add the operands 
MOV SUM, AL ;store the result in location SUM 
MOV AH, 4CH Set Up EO return tongs 
INT 218 8 
ENDP 
END MAIN his is the program sar pomt 


Figure 2-1. Simple Assembly Language Program 
Code segment definition 


The last segment of the program in Figure 2-1 is the code segment. The first line 
of the segment after the .CODE directive is the PROC directive. A procedure is a group 
of instructions designed to accomplish a specific function. A code segment may consist of 
only one procedure, but usually is organized into several small procedures in order to 
make the program miore structured. Every procedure must have a name defined by the 
PROC directive, followed by the assembly language instructions and closed by the ENDP 
directive. The PROC and ENDP statements must have the same label. The PROC direc- 
tive may have the option FAR or NEAR. The operating system that controls the comput- 
er must be directed to the beginning of the program in order to execute it. The OS requires 
that the entry point to the user program be a FAR procedure. From then on, either FAR or 
NEAR can be used. The differences between a FAR and a NEAR procedure, as well as 
where and why each is used, are explained later in this chapter. For now, just remember 
that in order to run a program, FAR must be used at the program entry point. 

A good question to ask at this point is: What value is actually assigned to the CS, 
DS, and SS registers for execution of the program? The operating system must pass con- 
trol to the program so that it may execute, but before it does that it assigns values for the 
segment registers. The operating system must do this because it knows how much mem- 
ory is installed in the computer, how much of it is used by the system, and how much is 
available. In the IBM PC, the operating system first finds out how many kilobytes of 
RAM memory are installed, allocates some for its own use, and then allows the user pro- 
gram to use the portions that it needs. Various OS versions require different amounts of 
memory, and since the user program must be able to run across different versions, one can- 
not tell the OS to give the program a specific area of memory, say from 25FFF to 289E2. 
Therefore, it is the job of the OS to assign exact values for the segment registers. When 
the program begins executing, of the three segment registers, only CS and SS have the 
proper values. The DS value (and ES, if used) must be initialized by the program. This is 
done as follows: 


MOV AX,@DATA ;DATA refers to the start of the data segment 

MOV DS,AX 

Remember from Chapter 1 that no segment register can be loaded directly. That 
is the reason the two lines of code above are needed. You cannot code "MOV DS , @DATA", 

After these housekeeping chores, the Assembly language program instructions 
can be coded to perform the desired tasks. The program in Figure 2-1 loads AL and BL 


ees 
58 


with DATA1 and DATA2, respectively, ADDs them together, and stores the result in SUM. 


MOV AL, DATAI1 
MOV BL, DATA2 
ADD AL, BE 
MOV SUM,AL 


The two last instructions in the shell are "MOV AH, 4CH" and "INT 21H." Their 
purpose is to return control to the operating system. The last two lines end the procedure 
and the program, respectively. Note that the label for ENDP (MAIN) matches the label for 
PROC. The END pseudo-instruction ends the entire program by indicating to OS that the 
eu MAIN has ended. For this reason the labels for the entry point and END must 
match. 

Figure 2-2 shows a sample shell of an Assembly language program. When writ- 
ing your first few programs, it is handy to keep a copy of this shell on your disk and sim- 
ply fill it in with the instructions and data for your program. 


; THE FORM OF AN ASSEMBLY LANGUAGE PROGRAM 
; USING SIMPLIFIED SEGMENT DEFINITION 
-MODEL SMALL 
-STACK 64 
- DATA 


r 


;place data definitions here 


te 


. CODE 

MAIN PROC FAR ;this is the programeentry point 
MOV AX, @DATA load the data segment address 
MOV DS, AX 7assign value to DS 


. 
r 


;place code here 


MOV AH, 4CH PSIG up to 
INT 21H pecuri ice) OS 
MAIN ENDP 
END MAIN ;this is the program exit point 


Figure 2-2. Shell of an Assembly Language Program 


Review Questions 


1. What is the purpose of pseudo-instructions? 
Pas are translated by the assembler into machine code, whereas 
are not. 
3. Write an Assembly language program with the following characteristics: 
(a) A data item named HIGH_DAT, which contains 95 
(b) Instructions that move HIGH_DAT to registers AH, BH, and DL 
(c) A program entry point named START 
4. Find the errors in the following: 
-MODEL ENORMOUS 
5S ULANG Is 
„CODE 
. DATA 
MAIN PROC FAR 
MOV AX, DATA 
MOV DS, @DATA 
MOV AL, 34H 
ADD AL,4FH 
MOV DATA1, AL 
START ENDP 
END 


CHAPTER 2: ASSEMBLY LANGUAGE PROGRAMMING 59 


SECTION 2.2: ASSEMBLE, LINK, AND RUN A PROGRAM 


Now that the basic form of an Assembly language program has been given, the 
next question is: How is it created and assembled? The three steps to create an executable 
Assembly language program are outlined as follows: 


Stp _—  — | Input | Program Output _ 
|1. Edit the program myfile.asm 
| 2. Assemble the program MASM or TASM myfile.obj 


myfile.obj LINK or TLINK myfile.exe 


are the assembler and linker programs for EDITOR 
Microsoft's MASM assembler. If you are PROGRAM 

using another assembler, such as Borland's | myfile.asm 
TASM, consult the manual for the procedure 

to assemble and link a program. Many 

excellent editors or word processors are ASSEMBLER 
available that can be used to create and/or PROGRAM 

edit the program. The editor must be able to 

produce an ASCII file. Although filenames myfile.lst myfile.crf 


follow the usual OS conventions, the source 
file must end in ".asm" for the assembler 
used in this book. This ".asm" source file is 
assembled by an assembler, such as 
Microsoft’s MASM or Borland’s TASM. 
The assembler will produce an object file 
and a list file, along with other files that may 
be useful to the programmer. The extension 
for the object file must be ".obj". This object myfile.exe 
file is input to the LINK program, which 
produces the executable program that ends 
in ".exe". The ".exe" file can be executed by the microprocessor. Before feeding the ".obj" 
file into LINK, all syntax errors produced by the assembler must be corrected. Of course, 
fixing these errors will not guarantee that the program will work as intended since the pro- 
gram may contain conceptual errors. Figure 2-3 shows the steps in producing an exe- 
cutable file. 

Figure 2-4 shows how an executable program is created by following the steps 
outlined above, and then run under DEBUG. The portions in bold indicate what the user 
would type in to perform these steps. Figure 2-4 assumes that the MASM, LINK, and 
DEBUG programs are on drive C and the Assembly language program is on drive A. The 
drives used will vary depending on how the system is set up. 


myfile.obj other obj files 


myfile.map 


Figure 2-3. Steps to Create a Program 


-asm and .obj files 


The ".asm" file (the source file) is the file created with a word processor or line 
editor. The MASM (or other) assembler converts the .asm file's Assembly language 
instructions into machine language (the ".obj" object file). In addition to creating the 
object program, MASM also creates the "Ist" list file. 


Ist file 


The ".Ist" file, which is optional, is very useful to the programmer because it lists 
all the opcodes and offset addresses as well as errors that MASM detected. MASM 
assumes that the list file is not wanted (NUL.LST indicates no list). To get a list file, type 
in a filename after the prompt. This file can be displayed on the monitor or sent to the 
printer. The programmer uses it to help debug the program. It is only after fixing all the 
errors indicated in the ".Ist" file that the ".obj" file can be input to the LINK program to 


aame 


60 


create the executable program. One way to look at the list file is to use the following com- 
mand at the OS level. This command will print myfile.lst to the monitor, one screen at a 
time. 


C>type myfile.Ist | more 


Another way to look at the list file is to bring it into a word processor. Then you 
can read it or print it. There are two assembler directives that can be used to make the "Ist" 
file more readable: PAGE and TITLE. 


C>MASM C:MYFILE.ASM <enter> 


Microsoft (R) Macro Assembler Version 5.10 
Copyright (C) Microsoft Corp 1981, 1988. All rights reserved. 


Object filename [C:MYFILE.OBJ]: C: <enter> 
source listing [NUL.LST] :C:MYFILE.LST <enter> 
Cross-reference [ NUL.CRF] : <enter> 


47962 + 413345 Bytes symbol space free 


0 Warning Errors 
0 Severe Errors 


C>LINK C:MYFILE.OBJ <enter> 


Microsoft (R) Overlay Linker Version 3.64 
Copyrignt (C) Microsoft Corp 1983-1988. All rights reserved. 


Run File [ C:MYFILE.EXE] :C:<enter> 

List File [ NUL.MAP] : <enter> 

Libraries [ .LIB] :<enter> 

LINK : warning L4021: no stack segment 


C>DEBUG C:MYFILE.EXE <enter> 

-U CS:0 1 <enter> 

1064:0000 B86610 MOV AX, 1066 

-D 1066:0 F <enter> 

MUGeIVOUON S722 oN 0000 000000 00-00 00700 00 O00) 00 00 00 R)e.sesese -seses 
-G <enter> 

Program terminated normally 

-D 1066:0 F <enter> 

LOSESOOOG) BA BE Tish (LO) OKO) (ONe) 0000-00 (00) (0f0) (0/0) (fo) (lO 00 00 IFRS RSS sao oaoo oe 
-Q <enter> 

C> 


Figure 2-4. Creating and Running the .exe File 
Note: The parts you type in are printed in bold. 


PAGE and TITLE directives 
The format of the PAGE directive is 
PAGE [ lines] ,[ columns] 


and its function is to tell the printer how the list should be printed. In the default mode, 
meaning that the PAGE directive is coded with no numbers coming after it, the output 
will have 66 lines per page with a maximum of 80 characters per line. In this book, pro- 
grams will change the default settings to 60 and 132 as follows: 


PAGE 60,132 


The range for number of lines is 10 to 255 and for columns is 60 to 132. When 


gt ESSE 
CHAPTER 2: ASSEMBLY LANGUAGE PROGRAMMING 61 


the list is printed and it is more than one page, the assembler can be instructed to print the 
title of the program on top of each page. What comes after the TITLE pseudo-instruction 
is up to the programmer, but it is common practice to put the name of the program as 
stored on the disk immediately after the TITLE pseudo-instruction and then a brief 
description of the function of the program. The text after the TITLE pseudo-instruction 
cannot be more than 60 ASCII characters. 


.crf file 


MASM produces another optional file, the cross-reference, which has the exten- 
sion ".crf". It provides an alphabetical list of all symbols and labels used in the program 
as well as the program line numbers in which they are referenced. This can be a great help 
in large programs with many data segments and code segments. 


LINKing the program 


The assembler (MASM) creates the opcodes, operands, and offset addresses 
under the ".obj" file. It is the LINK program that produces the ready-to-run version of a 
program that has the ".exe" (EXEcutable) extension. The LINK program sets up the file 
so that it can be loaded by the OS and executed. 

In Figure 2-4 we used DEBUG to execute the program in Figure 2-1 and analyze 
the result. In the program in Figure 2-1, three data items are defined in the data segment. 
Before running the program, one could look at the data in the data segment by dumping 
the contents of DS:offset as shown in Figure 2-4. Now what is the value for the DS reg- 
ister? This can vary from PC to PC and from OS to OS. For this reason it is important to 
look at the value in "MOV AX,xxxx" as was shown and use that number. The result of the 
program can be verified after it is run as shown in Figure 2-4. When the program is work- 
ing successfully, it can be run at the OS level. To execute myfile.exe, simply type in 


C>myfile 


However, since this program produces no output, there would be no way to veri- 
fy the results. When the program name is typed in at the OS level, as shown above, the 
OS loads the program in memory. This is sometimes referred to as mapping, which means 
that the program is mapped into the physical memory of the PC. 


-map file 


When there are many segments for code or data, there is a need to see where each 
is located and how many bytes are used by each. This is provided by the map file. This 
file, which is optional, gives the name of each segment, where it starts, where it stops, and 
its size in bytes. 


Download Microsoft Assembler (MASM) and a Tutorial on how to use it 
from the following website: 
http://www.MicroDigitalEd.com 


Review Questions 


1. (a) The input file to the MASM assembler program has the extension 
(b) The input file to the LINK program has the extension : 

2. Select all the file types from the second column that are the output of the pro- 
gram in the first column. 


Editor (a) .obj (b) .asm 
Assembler (c) exe (d) .Ist 
Linker (e) .crf (£) .map 


ees 


62 


SECTION 2.3: MORE SAMPLE PROGRAMS 


Now that some familiarity with Assembly language programming in the IBM PC 
has been achieved, in this section we look at more example programs in order to allow the 
reader to master the basic features of Assembly programming. The following pages show 
Program 2-1 and the list file generated when the program was assembled. After the pro- 
gram was assembled and linked, DEBUG was used to dump the code segment to see what 
value is assigned to the DS register. Precisely where the OS loads a program into RAM 
depends on many factors, including the amount of RAM on the system and the version of 
OS used. Therefore, remember that the value you get could be different for "MOV 
AX,xxxx" as well as for CS in the program examples. Do not attempt to modify the seg- 
ment register contents to conform to those in the examples, or your system may crash! 


Write, run, and analyze a program that adds 5 bytes of data and saves the result. The data should be 
the following hex numbers: 25, 12, 15, 1F, and 2B. 


PAGE our 132 

TITLE PROG2-1 (EXE) PURPOSE: ADDS 5 BYTES OF DATA 
-MODEL SMALL 
. STACK 64 


- DATA 
DATA_IN 28,120, 15h, TEN ZBH 
SUM ? 


r 


MAIN FAR 
AX, @DATA 
DS, AX 
S05 ;set up loop counter CX=5 
BX,OFFSET DATA IN ;set up data pointer BX 
= initialize AL 
;add next data item to AL 
;ymake BX point to next data item 
;decrement loop counter 
;jump if loop counter not zero 
;load result into sum 
PSS US wSicuuael 
;return to OS 


MAIN 


After the program was assembled and linked, it was run using DEBUG: 
C>debug prog2-1.exe 

mo eswO) 19 

LOG? 70000 MOV AX, 1066 

Lome? = 00038 MOV DS, AX 

1067:0005 MOV Cx, 0005 

1067:0008 MOV BX, 0000 

1067:000D ADD BAN, | BX] 

HOCI OO0O0F INC BX 

LOG7: 0010 DEC CX 

OGT +0013 MOV [ 0005] , AL 

1O67: 0016 MOV AH, 4C 

L0G) 20018 INT Zu 

-d 1066:0 

1066:0000 

Program terminated normally 

E WSE z 

1066:0000 25 12 15 1F 2B 96 00 00-00 00 00 00 00 00 00 00 
-q 

C> 


Program 2-1 


p 
CHAPTER 2: ASSEMBLY LANGUAGE PROGRAMMING 63 


Analysis of Program 2-1 


The DEBUG program is explained thoroughly in Appendix A. The commands 
used in running Program 2-1 were (1) u, to unassemble the code from cs:0 for 19 bytes; 
(2) d, to dump the contents of memory from 1066:0 for the next F bytes; and (3) g, to go, 
that is, run the program. 

Notice in Program 2-1 that when the program was run in DEBUG, the contents of 
the data segment memory were dumped before and after execution of the program to ver- 
ify that the program worked as planned. Normally, it is not necessary to unassemble this 
much code, but it was done here because in later sections of the chapter we examine the 
jump instruction in this program. Also notice that the first 5 bytes dumped above are the 
data items defined in the data segment of the program and the sixth item is the sum of 
those five items, so it appears that the program worked correctly (25H + 12H + 15H + 
IFH + 2BH = 96H). Program 2-1 is explained below, instruction by instruction. 

"MOV CX, 05" will load the value 05 into the CX register. This register is used by 
the program as a counter for iteration (looping). 

"MOV BX,OFFSET DATA_IN" will load into BX the offset address assigned to» 
DATA. The assembler starts at offset 0000 and uses memory for the data and then assigns 
the next available offset memory for SUM (in this case, 0005). 

"ADD AL,[ Bx] " adds the contents of the memory location pointed at by the reg- 
ister BX to AL. Note that [BX] is a pointer to a memory location. l 

"INC BX" simply increments the pointer by adding 1 to register BX. This will 
cause BX to point to the next data item, that is, the next byte. 

"DEC CX" will decrement (subtract 1 from) the CX counter and will set the zero 
flag high if CX becomes zero. 

"JNZ AGAIN" will jump back to the label AGAIN as long as the zero flag is indi- 
cating that CX is not zero. "JNZ AGAIN" will not jump (that is, execution will resume 
with the next instruction after the JNZ instruction) only after the zero flag has been set 
high by the "DEC Cx" instruction (that is, CX becomes zero). When CX becomes zero, 
this means that the loop is completed and all five numbers have been added to AL. 


Various approaches to Program 2-1 


There are many ways in which any program may be written. The method shown 
for Program 2-1 defined one field of data and used pointer [BX] to access data elements. 
In the method used below, a name is assigned to each data item that will be accessed in 
the program. Variations of Program 2-1 are shown below to clarify the use of addressing 
modes in the context of a real program and also to show that the x86 can use any gener- 
al-purpose register to do arithmetic and logic operations. In earlier-generation CPUs, the 
accumulator had to be the destination of all arithmetic and logic operations, but in the x86 
this is not the case. Since the purpose of these examples is to show different ways of 
accessing operands, it is left to the reader to run and analyze the programs. 


;from the data segment: 
DATAL DB 25H 
DARAZ SDB 12H 
DATAS TDB 15H 
DATA4 DB 1FH 
DIVAS DBE 2BA 


SUM DB ? 

;from the code segment: 

MOV AL, DATA1 ¡MOVE DATA] INTO AL 
ADD AL, DATA2 ADD DATA2 TO AL 


ADD AL, DATAS 

ADD AL,DATA4 

ADD AL, DATAS 

MOV SUM,AL ;SAVE AL IN SUM 


There is quite a difference between these two methods of writing the same pro- 
gram. While in the first one the register indirect addressing mode was used to access the 
data, in the second method the direct addressing mode was used. 


ess 


64 


Microsoft (R) Macro Assembler Version 5.10 NEY | 
PROG2-1 (EXE) PURPOSE: ADDS 5 BYTES OF DATA Page 1-1 


PAGE 60,132 

TITLE PROG2-1 (EXE) PURPOSE: ADDS 5 BYTES OF DATA 
-MODEL SMALL 
STACK 64 


> 


.DATA 
0000 25 12 15 1F 2B DATA_IN DB 25H,12H,15H,1FH,2BH 
0005 00 SUM DB ? 


oS OMIDNRWN 


.CODE 

11 0000 MAIN PROC FAR 

12 0000 B8 ---- R MOV AX,@DATA 

13 0003 8E D8 MOV DS,AX 

14 0005 B9 0005 MOV CX,05 ;set up loop counter CX=5 
15 0008 BB 0000 R MOV BX,OFFSET DATA IN ;set up data pointer BX 

16 000B BO 00 MOV AL,O ¿initialize AL 

17 000D 02 07 AGAIN: ADD AL,[BX] ;add next data item to AL 
18 OOOF 43 INC BX ;make BX point to next data item 
190010 49 DEC CX ;decrement loop counter 
20 0011 75 FA JNZ AGAIN ,jump if loop counter not zero 
21 0013 A2 0005 R MOV SUM,AL ;load result into sum 

22 0016 B4 4C MOV AH,4CH ;set up return 

23 0018 CD 21 INT 21H ;return to OS 

24 001A MAIN ENDP 

25 END MAIN 


Microsoft (R) Macro Assembler Version 5.10 2/03 /'7 
PROG2-1 (EXE) PURPOSE: ADDS 5 BYTES OF DATA Symbols-1 


Segments and Groups: 
Name Length Align Combine Class 


GROUP 

WORD PUBLIC'DATA' 

0040 PARA STACK 'STACK' 
WORD PUBLIC'CODE' 


Value Attr 
LNEAROOOD —_TEXT 
LBYTE0000 DATA 
F PROC0000 _TEXT Length=001A 
LBYTE0005 DATA 


TEXT _TEXT 
TEXT 0 
TEXT 0101h 
@DATASIZE TEXT 0 
@FILENAME TEXT prog2 1 
@VERSION TEXT 510 


25 Source Lines 
25 Total Lines 


25 Symbols 
45756 + 410160 Bytes symbol space free 0 Warning Errors 0 Severe Errors 


MASM List File for Program 2-1 
OO a 
CHAPTER 2: ASSEMBLY LANGUAGE PROGRAMMING 65 


Write and run a program that adds four words of data and saves the result. The values will be 234DH, 
1DE6H, 3BC7H, and 566AH. Use DEBUG to verify the sum is D364. 


Ws PROGAS2 (EXE) PURPOSE: ADDS 4 WORDS OF DATA 
PAGE 60,132 

.MODEL SMALL 

-STACK 64 


a 
r 


-DATA 
DATA IN DW 234DH, 1DE6H, 3BC7H, 566AH 
= ORG 10H 
DW ? 


. CODE 

MAIN PROC FAR 
MOV AX, @DATA 
MOV DS,AX 


MOV CX, 04 rset up loop counter CxX=4 
MOV DI,OFFSET DATA IN 7set up data pointer Di, 
MOV BX,00 E initialize BX 

ADD LP: ADD Be oan add contents pointed at by [DI] to Bx 

E INC}. DE ;increment DI twice 

INC DI CO point tO Nescdemonc 
DEC CX ;decrement loop counter 
JNZ ADD LP rjump if loop counter mee zero 
MOV SI,OFFSET SUM ;load pointer for sum 
MOV | Subl] , IBS ;store in data segment 
MOV AH, 4CH ;set up return 
INT ZILEI ReSEwIEMA LO OS 

MAIN ENDP 


END MAIN 


After the program was assembled and linked, it was run using DEBUG: 
C>debug c:prog2-2.exe 

1068:0000 B86610 MOV AX, 1066 

SD h066:0 LF 

1066:0000 4D 23 E6 1D C7 3B 6A 56-00 00 00 00 00 00 00 00 Mart 2 Gia nar 
1066:0010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 
E 

Program terminated normally 

=D LOGO iiy 

1066:0000 4D 23 E6 1D C7 3B 6A 56-00 00 00 00 00 00 00 00 MERGEN o sven snes 
1066:0010 64 D3 00 00 00 00 00 00-00 00 00 00 00 00 00 00 dS.............. 
T9 

C 


Program 2-2 
Analysis of Program 2-2 


First notice that the 16-bit data (a word) is stored with the low-order byte first. For 
example, "234D" as defined in the data segment is stored as "4D23", meaning that the 
lower address, 0000, has the least significant byte, 4D, and the higher address, 0001, has 
the most significant byte, 23. This is shown in the DEBUG display of the data segment. 
Similarly, the sum, D364, is stored as 64D3. As discussed in Chapter 1, this method of low 
byte to low address and high byte to high address operand assignment is referred to in 
computer literature as "little endian." 

Second, note that the address pointer is incremented twice, since the operand 
being accessed is a word (two bytes). The program could have used "ADD DI , 2" instead 
of using "INC DI" twice. When storing the result of word addition, "MOV SI,OFFSET 
SUM" was used to load the pointer (in this case 0010, as defined by ORG 0010H) for the 
memory allocated for the label SUM, and then "MOV [ SI] , Bx" was used to move the 
contents of register BX to memory locations with offsets 0010 and 0011. As was done pre- 
viously, it could have been coded simply as "MOV SUM, BX", using direct addressing mode. 


ese 
66 


Program 2-2 uses the ORG directive. In previous programs where ORG was not 
used, the assembler would start at offset 0000 and use memory for each data item. The 
ORG directive can be used to set the offset addresses for data items. Although the pro- 
grammer cannot assign exact physical addresses, one is allowed to assign offset address- 
es. The ORG directive in Program 2-2 caused SUM to be stored at DS:0010, as can be 
seen by looking at the DEBUG display of the data segment. 


Write and run a program that transfers 6 bytes of data from memory locations with offset of 0010H 
to memory locations with offset of 0028H. 


TITLE PROG2-3 (EXE) PURPOSE: TRANSFERS 6 BYTES OF DATA 
PAGE 60,132 
-MODEL SMALL 
-STACK 64 
. DATA 
ORG 10H 
DATA _IN DB 25H, 4FH, 85H, 1FH, 2BH, 0C4H 
ORG 28H 
COPY DB 6 DUP(?) 
. CODE 
MAIN PROC FAR 
MOV AX, @DATA 
MOV DS, AX 
MOV SI,OFFSET DATA_IN ;SI points to data to be copied 
MOV DI,OBPsnT COPY DI points to copy of “data 
MOV CX, 06H ;loop counter = 6 
MOV_LOOP: MOV ACSI] ;move the next byte from DATA area to AL 
MOV PDT AE ;move the next byte to COPY area 
INC SIL ;increment DATA pointer 
INC DI ;increment COPY pointer 
DEC CX ;decrement LOOP counter 
JNZ MOV LOOP ;jump if loop counter not zero 
MOV AH, 4CH ;set up to return 
INT 21H ,ectiian to OS 
MAIN ENDP 
END MAIN 


After the program was assembled and linked, it was run using DEBUG: 


C>debug prog2-3.exe 
=u €se0 i 

1069:0000 B86610 MOV AX, 1066 
=e LOSER Oz 

1066:0000 00 00 00 00 00 00 00 00-00 00 
LOSG2O0O0LO Be 4a Ss Ine 2B C4 00 00-00 00 
PeGe<0020 00 00 00 00 00 00 00 00-00 00 


mel 


Program terminated normally 

=e LOGG20 25 

1066:0000 00 00 00 00 00 00 90 00-00 00 
Deo 0010 25ear 65 IF 2B C4 00 00-00 00 
miea.00Z20 00700 G0 00 00 00 00 00-25 4F 
ao 

c> 


Program 2-3 
Analysis of Program 2-3 


Program 2-3 shows the data segment being dumped before and after the program 
was run to verify that the data was copied successfully. Notice that C4 was coded in the 
data segments as 0C4. This is required by the assembler to indicate that C is a hex num- 
ber and not a letter. This is required if the first digit is a hex digit A through F. 

This program uses two registers, SI and DI, as pointers to the data items being 
manipulated. The first is used as a pointer to the data item to be copied and the second as 
a pointer to the location the data item is to be copied to. With each iteration of the loop, 
both data pointers are incremented to point to the next byte. 


CHAPTER 2: ASSEMBLY LANGUAGE PROGRAMMING 67 


Stack segment definition revisited 


One of the primary functions of the operating system is to determine the total 
amount of RAM installed on the PC and then manage it properly. The OS uses the portion 
it needs for the operating system and allocates the rest. Since memory requirements vary 
for different OS versions, a program cannot dictate the exact physical memory location 
for the stack or any segment. Since memory management is the responsibility of the OS, 
it will map Assembly programs into the memory of the PC with the help of LINK. 

Although in the OS environment a program can have multiple code segments and 
data segments, it is strongly recommended that it have only one stack segment, to prevent 
RAM fragmentation by the stack. It is the function of LINK to combine all different code 
and data segments to create a single executable program with a single stack, which is the 
stack of the system. Various options for segment definition are discussed in Chapter 7 and 
many of these concepts are explained there. 


Review Questions 


1. What is the purpose of the INC instruction? 
. What is the purpose of the DEC instruction? 
3. In Program 2-1, why does the label AGAIN have a colon after it, whereas 
the label MAIN does not? 
4. State the difference between the following two instructions: 
MOV BX, DATAL 
MOV BX,OFFSET DATA1 


5. State the difference between the following two instructions: 
ADD AX,BX 
ADD AX, [ BX] 


SECTION 2.4: CONTROL TRANSFER INSTRUCTIONS 


In the sequence of instructions to be executed, it is often necessary to transfer pro- 
gram control to a different location. There are many instructions in the x86 to achieve 
this. This section covers the control transfer instructions available in the 8086 Assembly 
language. Before that, however, it is necessary to explain the concept of FAR and NEAR 
as it applies to jump and call instructions. 


FAR and NEAR 


If control is transferred to a memory location within the current code segment, it 
is NEAR. This is sometimes called intrasegment (within segment). If control is transferred 
outside the current code segment, it is a FAR or intersegment (between segments) jump. 
Since the CS:IP registers always point to the address of the next instruction to be execut- 
ed, they must be updated when a control transfer instruction is executed. Ina NEAR jump, 
the IP is updated and CS remains the same, since control is still inside the current code 
segment. In a FAR jump, because control is passing outside the current code segment, 
both CS and IP have to be updated to the new values. In other words, in any control trans- 
fer instruction such as jump or call, the IP must be changed, but only in the FAR case is 
the CS changed, too. 


Conditional jumps 


Conditional jumps, summarized in Table 2-1, have mnemonics such as JNZ (jump 
not zero) and JC (jump if carry). In the conditional jump, control is transferred to a new 
location if a certain condition is met. The flag register is the one that indicates the current 
condition. For example, with "JNZ label", the processor looks at the zero flag to see if it 
is raised. If not, the CPU starts to fetch and execute instructions from the address of the 
label. If ZF = 1, it will not jump but will execute the next instruction below the JNZ. 


aaee 
68 


Table 2-1: 8086 Conditional Jump Instructions 
| Mnemonic | Condition Tested “Jump IF...” 
| JA/INBE above/not below nor zero _| 
JAE/JNB above or equal/not below 
JB/JNAE below/not above nor equal 
JBE/INA below or equal/not above 
fe ca 

| JE/IZ equal/zero 

JG/JNLE greater/not less nor equal | 
JGE/JINL greater or equal/not less 
JL/INGE less/not greater nor equal | 
JLE/JNG less or equal/not greater 
JNC not ca 
| INE/INZ not equal/not zero | 
| JNO not overflow 
: JNP/JPO not parity/parity odd 
INS not sign 

JO overflow 

JP/JPE parity/parity equal 


Note: a es 

“Above” and “below” refer to the relationship of two unsigned values; “greater” and “less” refer 
to the relationship of two signed values. 

(Reprinted by permission of Intel Corporation, Copyright Intel Corp. 1989) 


Short jumps 


All conditional jumps are short jumps. In a short jump, the address of the target 
must be within -128 to +127 bytes of the IP. In other words, the conditional jump is a two- 
byte instruction: One byte is the opcode of the J condition and the second byte is a value 
between 00 and FF. An offset range of 00 to FF gives 256 possible addresses; these are 
split between backward jumps (to -128) and forward jumps (to +127). 

In a jump backward, the second byte is the 2's complement of the displacement 
value. To calculate the target address, the second byte is added to the IP of the instruction 
after the jump. To understand this, look at the unassembled code of Program 2-1 for the 
instruction JNZ AGAIN, repeated below. 


106 OOOO, B86610 MOV AX, 1066 
1067:0003 8ED8 MOV DS, AX 

1067:0005 B90500 MOV CX, 0005 
1067:0008 BBOOOO MOV BX, 0000 
L067;000D 0207 ADD AL,[ BX] 
1L0672000F 43 INC BX 

VG? OOL0 49 DEC CX 

POG O0TL. | sik JNZ 000D 

1067: 00137420500 MOV [ 0005] ,AL 


1067:0016 B44C MOV AH, 4C 
LOST e OO CDi INT 21 


The instruction "JNZ AGAIN" was assembled as "JNZ 000D", and 000D is the 
address of the instruction with the label AGAIN. The instruction "INZ 000D" has the 
opcode 75 and the target address FA, which is located at offset addresses 0011 and 0012. 
This is followed by "MOV SUM,AL", which is located beginning at offset address 0013. 
The IP value of MOV, 0013, is added to FA to calculate the address of label AGAIN (0013 


denen een aac, 
CHAPTER 2: ASSEMBLY LANGUAGE PROGRAMMING 69 


+ FA = 000D) and the carry is dropped. In reality, FA is the 2's complement of -6, mean- 
ing that the address of the target is -6 bytes from the IP of the next instruction. 

Similarly, the target address for a forward jump is calculated by adding the IP of 
the following instruction to the operand. In that case the displacement value is positive, as 
shown next. Below is a portion of a list file showing the opcodes for several conditional 
jumps. 


0005 8A 47 02 AGAIN: MOV AL,[ BX] +2 
0008 3c 61 CMP AL,61H 
000A 72 06 JB NEXT 

000C oC, TA CMP AL, 7AH 
000E Ii 02 JA NEXT 

0010 2A OR AND  AL,ODFH 
0012 88 04 NEXT: MONE MS AL 


In the program above, "JB NEXT" has the opcode 72 and the target address 06 and 
is located at IP = 000A and 000B. The jump will be 6 bytes from the next instruction, 
which is IP = 000C. Adding gives us 000CH + 0006H = 0012H, which is the exact address. 
of the NEXT label. Look also at "JA NEXT", which has 77 and 02 for the opcode and dis- 
placement, respectively. The IP of the following instruction, 0010, is added to 02 to get 
0012, the address of the target location. 

It must be emphasized that regardless of whether the jump is forward or back- 
ward, for conditional jumps the address of the target address can never be more than -128 
to +127 bytes away from the IP associated with the instruction following the jump (~ for 
the backward jump and + for the forward jump). If any attempt is made to violate this rule, 
the assembler will generate a "relative jump out of range" message. These conditional 
jumps are sometimes referred to as SHORT jumps. 


Unconditional jumps 


"JMP label" is an unconditional jump in which control is transferred uncondi- 
tionally to the target location label. The unconditional jump can take the following forms: 
1. SHORT JUMP, which is specified by the format "JMP SHORT label". This is a 

jump in which the address of the target location is within -128 to +127 bytes of mem- 
ory relative to the address of the current IP. In this case, the opcode is EB and the 
operand is 1 byte in the range 00 to FF. The operand byte is added to the current IP 
to calculate the target address. If the jump is backward, the operand is in 2's comple- 
ment. This is exactly like the J condition case. Coding the directive "short" makes the 
Jump more efficient in that it will be assembled into a 2-byte instruction instead of a 
3-byte instruction. 

2. NEAR JUMP, which is the default, has the format "IMP label". This is a near jump 

(within the current code segment) and has the opcode E9. The target address can be 

any of the addressing modes of direct, register, register indirect, or memory indirect: 

Direct JUMP is exactly like the short jump explained earlier, except that the target 

address can be anywhere in the segment within the range +32767 to -32768 of the 

current IP. 

(b) Register indirect JUMP; the target address is in a register. For example, in "JMP BX", 
IP takes the value BX. 

(c) Memory indirect JMP; the target address is the contents of two memory locations 
pointed at by the register. Example: "JMP [ DIJ " will replace the IP with the contents 
of memory locations pointed at by DI and DI+1. 

3. FAR JUMP, which has the format "IMP FAR PTR label". This isa jump out of the 
current code segment, meaning that not only the IP but also the CS is replaced with 
new values. 


CALL statements 


(a 


— 


Another control transfer instruction is the CALL instruction, which is used to call 
a procedure. CALLs to procedures are used to perform tasks that need to be performed 
frequently. This makes a program more structured. The target address could be in the cur- 


eeeeeeeeeeeeeSFSFsmsmsFeseF 


70 


rent segment, in which case it will be a NEAR call or outside the current CS segment, 
which is a FAR call. To make sure that after execution of the called subroutine the micro- 
processor knows where to come back, the microprocessor automatically saves the address 
of the instruction following the call on the stack. It must be noted that in the NEAR call 
only the IP is saved on the stack, and in a FAR call both CS and IP are saved. When a sub- 
routine is called, control is transferred to that subroutine and the processor saves the IP 
(and CS in the case of a FAR call) and begins to fetch instructions from the new location. 
After finishing execution of the subroutine, for control to be transferred back to the caller, 
the last instruction in the called subroutine must be RET (return). In the same way that the 
assembler generates different opcode for FAR and NEAR calls, the opcode for the RET 
instruction in the case of NEAR and FAR is different, as well. For NEAR calls, the IP is 
restored; for FAR calls, both CS and IP are restored. This will ensure that control is given 
back to the caller. As an example, assume that SP = FFFEH and the following code is a 
portion of the program unassembled in DEBUG: 


12B0:0200 BB1295. MOV BX, 9512 
12B0:0203 E8FAO0O0 CALL 0300 
12B0:0206 B82F14 MOV AX,142F 


Since the CALL instruction is a NEAR call, meaning that it is in the same code 
segment (different IP, same CS), only IP is saved on the stack. In this case, the IP address 
of the instruction after the call is saved on the stack as shown in Figure 2-5. That IP will 
be 0206, which belongs to the "MOV AX, 142F" instruction. 

The last instruction of the called subroutine 
must be a RET instruction that directs the CPU to POP 
the top 2 bytes of the stack into the IP and resume exe- 
cuting at offset address 0206. For this reason, the num- 
ber of PUSH and POP instructions (which alter the SP) 
must match. In other words, for every PUSH there must 
be a POP. 


TZBOTOSU00" 53 “PUSH BX 
POS Od. ae eant oes 
P2S0c0309" 5B. POP BX 
1280 :080A C3 RET 


Assembly language subroutines 


In Assembly language programming it is com- Figure 2-5. IP in the Stack 

mon to have one main program and many subroutines to 

be called from the main program. This allows you to make each subroutine into a sepa- 
rate module. Each module can be tested separately and then brought together, as will be 
shown in Chapter 7. The main program is the entry point from the OS and is FAR, as 
explained earlier, but the subroutines called within the main program can be FAR or 
NEAR. Remember that NEAR routines are in the same code segment, while FAR routines 
are outside the current code segment. If there is no specific mention of FAR after the 
directive PROC, it defaults to NEAR, as shown in Figure 2-6. From now on, all code seg- 
ments will be written in that format. 


Rules for names in Assembly language 


By choosing label names that are meaningful, a programmer can make a program 
much easier to read and maintain. There are several rules that names must follow. First, 
each label name must be unique. The names used for labels in Assembly language pro- 
gramming consist of alphabetic letters in both upper- and lowercase, the digits 0 through 
9, and the special characters question mark (?), period (.), at (@), underline (_), and dol- 
lar sign ($). The first character of the name must be an alphabetic character or special 


CHAPTER 2: ASSEMBLY LANGUAGE PROGRAMMING 71 


character. It cannot be a digit. The period can only be used as the first character, but this 
is not recommended since later versions of MASM have several reserved words that begin 
with a period. Names may be up to 31 characters long. A list of reserved words is given 
at the end of Appendix C. 


FAR ;THIS IS THE ENTRY POINT FOR OS 
AX, @DATA 

DS, AX 

SUBR1 

SUBR2 

SUBR3 

AH, 4CH 

21H 


;THIS IS THE EXIT POINT 


Figure 2-6. Shell of Assembly Language Subroutines 


Review Questions 


1. If control is transferred outside the current code segment, is it NEAR or FAR? 
. Ifa conditional jump is not taken, what is the next instruction to be executed? 
3. In calculating the target address to jump to, a displacement is added to the contents 


of register : 
4. What is the advantage in coding the operator "SHORT" in an unconditional jump? 
5. A(n) jump is within -128 to +127 bytes of the current IP. A(n) 


jump is within the current code segment, whereas a(n) 
jump is outside the current code segment. 

How does the CPU know where to return to after executing a RET? 
Describe briefly the function of the RET instruction. 

State why the following label names are invalid. 

(a) GET.DATA (b)1 NUM (c) TEST-DATA (d) RET 


oD 


SECTION 2.5: DATA TYPES AND DATA DEFINITION 


The assembler supports all the various data types of the x86 microprocessor by 
providing data directives that define the data types and set aside memory for them. In this 
section we study these directives and how they are used to represent different data types 
of the x86. The application of these directives becomes clearer in the context of examples 
in subsequent chapters. 


— ees 


72 


0000 19 
0001 89 
0002 12 
0010 


x86 data types 


The 8088/86 microprocessor supports many data types, but none are longer than 
16 bits wide since the size of the registers is 16 bits. It is the job of the programmer to 
break down data larger than 16 bits (0000 to FFFFH, or 0 to 65535 in decimal) to be 
processed by the CPU. Many of these programs are shown in Chapter 3. The data types 
used by the 8088/86 can be 8-bit or 16-bit, positive or negative. If a number is less than 8 
bits wide, it still must be coded as an 8-bit register with the higher digits as zero. 
Similarly, if the number is less than 16 bits wide it must use all 16 bits, with the rest being 
Os. For example, the number 5 is only 3 bits wide (101) in binary, but the 8088/86 will 
accept it as 05 or “0000 0101" in binary. The number 514 is "10 0000 0010" in binary, 
but the 8088/86 will accept it as "0000 0010 0000 0010" in binary. The discussion of 
signed numbers is postponed until later chapters since their representation and application 
are unique. 


Assembler data directives 


All the assemblers designed for the x86 microprocessors have standardized the 
directives for data representation. The following are some of the data directives used by 
the x86 microprocessor and supported by all software vendors. 


ORG (origin) 


ORG is used to indicate the beginning of the offset address. The number that 
comes after ORG can be either in hex or in decimal. If the number is not followed by H, 
it is decimal and the assembler will convert it to hex. Although the ORG directive is used 
extensively in this book in the data segment to separate fields of data to make it more read- 
able for the student, it can also be used for the offset of the code segment (IP). 


DB (define byte) 


The DB directive is one of the most widely used data directives in the assembler. 
It allows allocation of memory in byte-sized chunks. This is indeed the smallest alloca- 
tion unit permitted. DB can be used to define numbers in decimal, binary, hex, and ASCII. 
For decimal, the D after the decimal number is optional, but using B (binary) and H (hexa- 
decimal) for the others is required. Regardless of which one is used, the assembler will 
convert numbers into hex. To indicate ASCII, simply place the string in single quotation 
marks (‘like this'). The assembler will assign the ASCII code for the numbers or charac- 
ters automatically. DB is the only directive that can be used to define ASCII strings larg- 
er than two characters; therefore, it should be used for all ASCII data definitions. 


DATA! DB 25 :DECIMAL 

DATA2 DB 10001001B ;BINARY 

DATA3 DB 12H ;HEX 
ORG 0010H 


0010 32 35 39 31 DATA4 DB ‘2591‘ ;ASCII NUMBERS 
0018 


0018 00 
0020 


ORG 0018H 
DATAS DB ? ;SET ASIDE A BYTE 
ORG 0020H 


0020 4D 79 20 6E 61 6D DATA6 DB ‘My name is Joe‘ ;ASCII CHARACTERS 
65 20 69 73 20 4A 
6F 65 


List File for DB Examples 


Following are some DB examples: 


DATAI DB 2a) ; DECIMAL 
DATA2 DB 10001001B ; BINARY 
DATA3 DB 12H ; HEX 
ORG 0010H 
DATA4 DB 4 ats) 2) S ; ASCII NUMBERS 


err e eee eee eee ——EEE—EEE——EE 


CHAPTER 2: ASSEMBLY LANGUAGE PROGRAMMING 73 


ORG 0018H 


DATAS DB 2 TOEL FNSMEDS, A INC IE 
ORG 0020H 
DATA6 DB My name is Joe! ASCE CRARAGCFERS 


Either single or double quotes can be used around ASCII strings. This can be use- 
ful for strings, which should contain a single quote such as "O'Leary". 


DUP (duplicate) 
DUP is used to duplicate a given number of characters. This can avoid a lot of 


typing. For example, contrast the following two methods of filling six memory locations 
with FFH: 


ORG 0030H 

DATA7 DB OFFH, OFFH, OFFH, OFFH, OFFH, OFFH ETOL 6 ISS; NLS 
ORG 38H 

DATA8 DB 6 DUP(OFFH) ETLI 6 TBYTES Waa ees 

; the following reserves 32 bytes of memory with no initial value given, 
ORG 40H 

DATA9 DB 32 DUP (?) ASE MNS ICO S2 BYTRHS 


: DUE can be used inside another DUP 
; the following fills 10 bytes with 99 
DATA1O DB 5 DUP (2 DUP (99)) ;FILL 10 BYTES WITH. 99 


0030 ORG 0030H 

0030 PFFF FF FF FF FF DATA7 DB OFFH,OFFH,OFFH,OFFH,OFFH,OFFH ; 6 FF 
0038 ORG 38H 

0038 0006[ ja DATA8 DB 6 DUP(OFFH) ;FILL 6 BYTES WITH FF 


0040 ORG 40H 
0040 0020 [ DATA9 DB 32 DUP (?) ;SET ASIDE 32 BYTES 


0060 ORG 60H 
0060 0005[ DATA10 DB 5 DUP (2 DUP (99)) ;FILL 10 BYTES WITH 99 


List File for DUP Examples 
DW (define word) 


DW is used to allocate memory 2 bytes (one word) at a time. The following are 
some examples of DW: 


ORG 70H 
DATA11 DW 954 ; DECIMAL 
DATA12 DW POCTOLOLOLO0B ; BINARY 
DATA13 DW Zo sles ; HEX 
ORG 78H 
DATA14 DW 9,2,7,0CH,00100000B, 5, 'HI' ;MISC. DATA 
DATA15 DW Suey ite) SET ASIDE 8 WORDS 


0070 ORG 70H 
0070 03BA DATAI!L DW 954 ;DECIMAL 
0072 0954 DATAI2 DW — 100101010100B ;BINARY 
0074 253F DATA13 DW 253FH ;HEX 
0078 0009 0002 0007 000C A oot OM 
DATA14 DW _ 9,2,7,0CH,00100000B,5,’HI’ ; : 
0020 0005 4849 gree 


0086 0008[ ME DATAIS DW 8 DUP(?) ;SET ASIDE 8 WORDS 


List File for DW Examples 


—_—— ees 
74 


EQU (equate) 


This is used to define a constant without occupying a memory location. EQU does 
not set aside storage for a data item but associates a constant value with a data label so 
that when the label appears in the program, its constant value will be substituted for the 
label. EQU can also be used outside the data segment, even in the middle of a code seg- 
ment. Using EQU for the counter constant in the immediate addressing mode: 


COUNT EQU 25 


When executing the instructions "MOV CX,COUNT", the register CX will be 
loaded with the value 25. This is in contrast to using DB: 


COUND PEs 25 


When executing the same instruction "MOV CX,COUNT" it will be in the direct 
addressing mode. Now what is the real advantage of EQU? First, note that EQU can also 
be used in the data segment: 


COUNT EQU 25 
COUNTER1 DB COUNT 
COUNTER2 DB COUNT 


Assume that there is a constant (a fixed value) used in many different places in 
the data and code segments. By the use of EQU, one can change it once and the assem- 
bler will change all of them, rather than making the programmer try to find every location 
and correct it. 


DD (define doubleword) 


The DD directive is used to allocate memory locations that are 4 bytes (two 
words) in size. Again, the data can be in decimal, binary, or hex. In any case the data is 
converted to hex and placed in memory locations according to the rule of low byte to low 
address and high byte to high address. DD examples are: 


ORG 00A0H 
DATA16 DD LOS ; DECIMAL 
DATA17 DD 10001001011001011100B ; BINARY 
DATA18 DD 5C2A57F2H ; HEX 
DATA19 DD 23H peew OI, 69933 


00A0 ORG 00A0H 
00A0 000003FF DATA16 DD 1023 ;DECIMAL 
00A4 0008965C DATA17 DD 10001001011001011100B ;BINARY 


00A8 5C2A57F2 DATA18 DD | 5C2A57F2H sHEX 
O0O0AC 00000023 00034789 DATAI19 DD» 23H,34789H,65533 
0000FFFD 


List File for DD Examples 
DQ (define quadword) 


DQ is used to allocate memory 8 bytes (four words) in size. This can be used to 
represent any variable up to 64 bits wide: 


ORG OO0COH 
DATA20 DQ 4523C2H 7; HEX 
DATA21 DQ CEE /ASCII CHARACTERS 
DATA22 DO 2 7 NOTHING 


00C0 ORG 00COH 
00CO C223450000000000 DATA20 DQ 4523C2H ;HEX 


00C8 4948000000000000 DATA21 DQ “HP ;ASCII CHARACTERS 
00D0 0000000000000000 DATA22 DQ ? ;NOTHING 


List File for DQ Examples 


eee eee eee eee ee eee 


CHAPTER 2: ASSEMBLY LANGUAGE PROGRAMMING 75 


DT (define ten bytes) 


DT is used for memory allocation of packed BCD numbers. The application of 
DT will be seen in the multibyte addition of BCD numbers in Chapter 3. For now, observe 
how they are located in memory. Notice that the "H" after the data is not needed. This 
directive allocates 10 bytes, but a maximum of 18 digits can be entered. 


ORG OOEOH 
DATA23 DT 867943569829 -BCP 
DATA24 DT ? 7; NOTHING 


OOEO ORG 00E0H 
OOEO 299856437986000000 DATA23 DT 867943569829 ;BCD 
00 


0OEA 000000000000000000 DATA24 DT ? ;NOTHING 
00 


List File for DT Examples 
DT can also be used to allocate 10-byte integers by using the "D" option: 


DEC DI 655s 5d ;the assembler will convert the decimal 
;number to hex and store it 


Figure 2-7 shows the memory dump of the data section, including all the exam- 
ples in this section. It is essential to understand the way operands are stored in memory. 
Looking at the memory dump shows that all of the data directives use the little endian for- 
mat for storing data, meaning that the least significant byte is located in the memory loca- 
tion of the lower address and the most significant byte resides in the memory location of 
the higher address. For example, look at the case of "DATA20 DQ 4523C2", residing in 
memory starting at offset OOCOH. C2, the least significant byte, is in location 00CO, with 
23 in 00C1, and 45, the most significant byte, in 00C2. It must also be noted that for 
ASCII data, only the DB directive can be used to define data of any length, and the use of 
DD, DQ, or DT directives for ASCII strings of more than 2 bytes gives an assembly error. 
When DB is used for ASCII numbers, notice how it places them backwards in memory. 
For example, see "DATA4 DB '2591" at origin 10H: 32, ASCII for 2, is in memory loca- 
tion 10H; 35, ASCII for 5, is in 11H; and so on. 


=D OGEC 
1066:0000 
1066:0010 
1066:0020 
1066:0030 
1066:0040 
1066:0060 


ee E 2? 


cee eee meee ans 


eevee eenec eee tee ee 
Cr any 


1066:0070 
1066:0080 
1066:0090 
1066: 00A0 
1066: 00B0 
1066:00C0 
1066:00D0 
1066:00E0 


C 0) 1S 16 160 ia) (et ce) E S aire ae, 


er E S 


Figure 2-7. DEBUG Dump of Data Segment 


ħħ 
76 


Review Questions 


im “the directive is always used for ASCII strings longer than 2 bytes. 
2. How many bytes are defined by the following? 
DATA 1 DB 6 DUP (4 DUP (OFFH)) 


3. Do the following two data segment definitions result in the same storage in 
bytes at offset 10H and 11H? If not, explain why. 


ORG 10H ORG 10H 
DATA 1 DB 72 DATA_1 DW 7204H 
DATA 2 DB 04H 


4. The DD directive is used to allocate memory locations that are __ bytes 

in length. The DQ directive is used to allocate memory locations 

that are bytes in length. 

State briefly the purpose of the ORG directive. 

What is the advantage in using the EQU directive to define a constant value? 
How many bytes are set aside by each of the following directives? 

(a) ASC_DATA DB '1234' (b) HEX DATA DW 1234H 

8. Does the little endian storage convention apply to the storage of ASCII data? 


Bae) oA 


SECTION 2.6: FULL SEGMENT DEFINITION 


The way that segments have been defined in the programs above is a newer defi- 
nition referred to as simple segment definition. It is supported by Microsoft's MASM 5.0 
and higher, Borland's TASM version 1 and higher, and many other compatible assemblers. 
The older, more traditional definition is called the full segment definition. Although the 
simplified segment definition is much easier to understand and use, especially for begin- 
ners, it is good to become familiar with full segment definition since many older programs 
use it. 


Segment definition 


The SEGMENT and ENDS directives indicate to the assembler the beginning and 
ending of a segment and have the following format: 


label SEGMENT [ options] 
;place the statements belonging to this segment here 
label ENDS 


The label, or name, must follow naming conventions (see the end of Section 2.4) 
and must be unique. The [options] field gives important information to the assembler for 
organizing the segment, but is not required. The ENDS label must be the same label as in 
the SEGMENT directive. In the full segment definition, the ".MODEL" directive is not 
used. Further, the directives ". STACK", ". DATA", and ".CODE" are replaced by SEG- 
MENT and ENDS directives that surround each segment. Figure 2-8 shows the full seg- 
ment definition and simplified format, side by side. This is followed by Programs 2-2 and 
2-3, rewritten using the full segment definition. 


Stack segment definition 


The stack segment shown below contains the line: "DB 64 DUP (?)" to reserve 
64 bytes of memory for the stack. The following three lines in full segment definition are 
comparable to ". STACK 64" in simple definition: 


STSEG SEGMENT ¿the "SEGMENT" directive begins the segment 
DB 64 DUP (?) ;this segment contains only one line 
STSEG ENDS ;the "ENDS" segment ends the segment 


pE aaa. 
CHAPTER 2: ASSEMBLY LANGUAGE PROGRAMMING 77 


7;FULL SEGMENT DEFINITION 
a Stack Segment ——— 
namel SEGMENT 


DB 
namel ENDS 


data segment —- 2 
name2 SEGMENT 
;place data definitions here 


name2 ENDS 


*#—- code segment —— 


SEGMENT 
MAIN PROC 


ASSUME 


MOV 
MOV 


;SIMPLIFIED FORMAT 
.MODEL SMALL 
. STACK 64 
(I ; 


. 
ja 


64 DUP 


DATA 
;place data definitions here 


FAR 
AX, @DATA 
DS, A 


FAR 


AX, name2 
DS, AX 


ENDP 
ENDS 


END MAIN 


Figure 2-8. Full versus Simplified Segment Definition 


PURPOSE: ADDS 4 WORDS OF DATA 
60; 132 

SEGMENT 

DB 32 DUP (?) 
STSEG ENDS 
DTSEG SEGMENT 
DATA_IN DW 
ORG 10H 
DW ? 
ENDS 


TITLE 
PAGE 
STSEG 


234DH, 1DE6H, 3BC7H, 566AH 


SUM 
DTSEG 


CDSEG 
MAIN 


SEGMENT 
PROC FAR 

ASSUME CS:CDSEG, DS:DTSEG, SS:STSEG 
MOV AX,DTSEG 

MOV DS,AX 

MOV CX,04 

MOV DI,OFFSET DATA IN 
MOV BX, 00 
ADD BX,[ DI] 
INC DI 

INC DI 

DEC Cx 

JNZ ADD LP 
MOV SI,OFFSET SUM 
MOV [SI] ,BX 

MOV AH, 4CH 

INT 21H 

ENDP 

ENDS 

END 


set Up loopkconntenr CXA 
rset up data ponnter DI 
rinitialize BX 

iadd contents pointed at by [ DI] 
;increment DI twice 
;to point to next wona 
;decrement loop counter 

;jump if loop counter not zero 
;load pointer for sum 

;store in data segment 

PSSic whey iSiebhaie 

TECCO COTOS 


COTBX 


Program 2-2, rewritten with full segment definition 


aaea 
78 


TES 
PAGE 
STSEG 


STSEG 


is 


DTSEG 
DATA_IN 


COBY 
DESEG 


CDSEG 
MAIN 


PURPOSE: TRANSFERS 6 BYTES OF DATA 
60,132 

SEGMENT 

DB ee DUP (?) 

ENDS 


SEGMENT 

ORG 10H 

DB 25H, AEH, 85H, 1FH, 2BH, OC4H 
ORG 28H 

DB 6 DUP(?) 

ENDS 


SEGMENT 

BROC FAR 

ASSUME CS:CDSEG, DS: DTSEG, SS:STSEG 

MOV AX, DTSEG 

MOV DS, AX 

MOV ST;/OPPSET DATA IN ;SI points to data to be copied 
MOV DIL APIS (GONE FIDL jOGsLINES iO Cewek? Che data 

MOV CX, 06H ;loop counter = 6 

MOV AL,[ SI] ¡move next byte from DATA area to AL 
MOV [ DI) AL ;move the next byte to COPY area 
INC Sl ;increment DATA pointer 

INC DI ;increment COPY pointer 

DEC CX ;decrement LOOP counter 

JNZ MOV_LOOP ;jump if loop counter not zero 
MOV AH, 4CH ;set up to return 

INT 21H PicSrewblieinl jclo) Ors 

ENDP 

ENDS 

END 


Program 2-3, rewritten with full segment definition 


Data segment definition 


In full segment definition, the SEGMENT directive names the data segment and 
must appear before the data. The ENDS segment marks the end of the data segment: 


DTSEG SEGMENT ;the SEGMENT directive begins the segment 
;define your data here 
DTSEG ENDS ;the ENDS segment ends the segment 


Code segment definition 
The code segment also begins and ends with SEGMENT and ENDS directives: 


CDSSEG SEGMENT ;the SEGMENT directive begins the segment 
PVO Coce 1S Ine 
CDSEG ENDS ;the ENDS segment ends the segment 


In full segment definition, immediately after the PROC directive is the ASSUME 
directive, which associates segment registers with specific segments by assuming that the 
segment register is equal to the segment labels used in the program. If an extra segment 
had been used, ES would also be included in the ASSUME statement. The ASSUME 
statement is needed because a given Assembly language program can have several code 


a! 


CHAPTER 2: ASSEMBLY LANGUAGE PROGRAMMING 79 


segments, one or two or three or more data segments and more than one stack segment, 
but only one of each can be addressed by the CPU at a given time since there is only one 
of each of the segment registers available inside the CPU. Therefore, ASSUME tells the 
assembler which of the segments defined by the SEGMENT directives should be used. It 
also helps the assembler to calculate the offset addresses from the beginning of that seg- 
ment. For example, in "MOV AL,[ BX] " the BX register is the offset of the data segment. 

Upon transfer of control from OS to the program, of the three segment registers, 
only CS and SS have the proper values. The DS value (and ES, if used) must be initial- 
ized by the program. This is done as follows in full segment definition: 


MOV AX, DTSEG ;DTSEG is the label for the data segment 
MOV DS,AX 


Using the emu8086 assembler 


There is a simple and popular assembler called emu8086 that one can use for 
assembling the 8086 Assembly language programs. It is available from the 
www.emu8086.com website. Examine Figures 2-9 to 2-13 for screenshots using 
emu8086. 


Download the emu8086 assembler from the following website: 
http://www.emu8086.com 


See a tutorial an emu8086 on the following website: 
http://www.MicroDigitalEd.com 


i | emus0s6 ~ eevee idler wel irom Eo emulator 4.04 


open im. is ‘save | campile emulate | calculator convertor options 
s multi- -segment executable file template. : 


data segment 
5; add your data here? 
m pkey db “press any key...$" 
ends 


stack segment 
dw 128 dupcö> 
ends 


code segment 

start: 

3 set segment registers: 
mov ax. data 
mov ds, ax 
mov eS, ax 


s add your code here 


lea dx, pkey 
mov ah, ? 
int 21h > Output string at ds:dx 


i wait for any key... 
moy 1 
int 
gen 4cQ6h i exit to operating system. 
in 

ends 


end start j set entry point and stop the assembler. 


drag a file here to open 


Figure 2-9. emu8086 Screen Shot 


eee 


80 


> a ø z ? 
examples ~ save pile emulate | Wervertcs: | options help about 


multi-segment executable file t 


emplate . 


; Flat assembler syntax 


format MZ 


entry code_seg:start ; set entry point 


stack 256 


13 segment data_seg 

14 3 add your data heret 

EE DATA DB 52H 

biG DATA2 DB 29H 

17? SUM DB ? 

p pkey db "press any key...$" 
z6 


EI segment code_seg 
22 start: 


22 % set segment registers: 

124 mov ax, data_seg 

25 mou ds, ax 

26 mov es, ax 

28 ; add your code here 

129 MOU AL.CDATA1] sget the first operand 
I MOU BL.CDATA2Z] get the second operand 
34 ADD s ;add the operands 

Ee MOU [SUM].AL ;store the result in location SUM 
74 mov dx, pkey 

ES mov ah, 

36 int 21h >; output string at ds:dx 
Ry 

38 3 Wait for any key.... 

139 mov ah, 

Bea int 21h 

pi 

| 42 mov ax, 4c@6h i exit to operating system. 
E 43 int 21h 


Figure 2-11. emu8086 Screenshot of Variables for Program in Figure 2-10 


CHAPTER 2: ASSEMBLY LANGUAGE PROGRAMMING 81 


assembler emulator math 
a Wa 


ppan EO ee 
gment execu 


multi- 


cathe © 


$e 


Flat assembler syntax 


format MZ 


entry code_segistart ; set e 


stack 256 


segment data_seg 
; add your data here? 
DATA_IN DB 25H,12H,15H, 
SUM DB ? 
pkey db “press any key.. 


segment code_seg 

start: 

i set segment register 
mov ax, data_seg 
mov ds, ax 
mov es, ax 


« 
Ss 


CH, B5 
BX .DATA_IN 
AL,@ 


AL, [BH] 
BR 


CR 
AGAIN 
CSUMI.AL 


dx, 
ah, 
21h 


set 


pkey 
Out put 


i 
3; wait for any key.... 
mov ah, 1 

int 21h 


mov ax, 4c@@h ; exit ta 


int 21h 


Figure 2-13. emu8086 Screenshot of Variables 


82 


ep 


ile template. 


ntry point 


1FH,2BH 
heel 


;set up loop counter C3 =5 

up data pointer BX 

sinitialize AL 

;add next data item to AL 

smake BR point to next data item 
;decrement loop counter 


jump if loop counter not zero 
s;sload result into sum 


string at ds:dx 


operating system. 


if drag a file here ta open 


for Program in Figure 2-12 


EXE vs. COM Files 


All program examples so far were designed to be assembled and linked into EXE 
files. This section looks at the COM file, which like the EXE file contains the executable 
machine code and can be run at the OS level. 


Why COM files? 


There are occasions when, due to a limited amount of memory, one needs to have 
very compact code. This is the time when the COM file is useful. The fact that the EXE 
file can be of any size is one of the main reasons that EXE files are used so widely. On 
the other hand, COM files are used because of their compactness since they cannot be 
greater than 64K bytes. The reason for the 64K-byte limit is that the COM file must fit 
into a single segment, and since in the x86 the size of a segment is 64K bytes, the COM 
file cannot be larger than 64K. To limit the size of the file to 64K bytes requires defining 
the data inside the code segment and also using an area (the end area) of the code segment 
for the stack. One of the distinguishing features of the COM file program is the fact that 
in contrast to the EXE file, it has no separate data segment definition. One can summarize 
the differences between COM and EXE files as shown in Table 2-2. 


Table 2-2: EXE vs. COM File Format 


unlimited size maximum size 64K bytes 


. . . } 
stack segment is defined no stack segment definition | 


¡data segment is defined data segment defined in code segment | 
\code, data defined at any offset address| code and data begin at offset 0100H 
larger file (takes more memory) smaller file (takes less memory) 


Another reason for the difference in the size of the EXE and COM files is the fact 
that the COM file does not have a header block. The header block, which occupies 512 
bytes of memory, precedes every EXE file and contains information such as size, address 
location in memory, and stack address of the EXE module. 


Review Questions 


1. mhe (full, simple) segment definition is the newer definition style. 

2. True or false. The ASSUME directive is used in simple segment definition. 

3. In full segment definition, each segment begins with the directive and ends 
with a matching directive. 


SECTION 2.7: FLOWCHARTS AND PSEUDOCODE 


Structured programming is a term used to denote programming techniques that 
can make a program easier to code, debug, and maintain over time. There are certain prin- 
ciples that every structured program should follow. Some of these are as follows: 

1. The program should be designed before it is coded. By using techniques of flowchart- 
ing or pseudocode, described below, the design of the program is clear to the person 
coding it as well as those who will maintain the program later. 

2. Using comments within the program and documentation accompanying the program 
also will help someone else figure out what the program does and how it does it. It 
may even help the programmer who wrote the program remember how it worked 
years later! 

3. The main routine should consist primarily of calls to subroutines that perform the 
work of the program. This is sometimes called top-down programming. Use subrou- 
tines to accomplish tasks that are repeated. This saves time in coding and also makes 
the program easier to read. Also use subroutines to perform related tasks, such as 
“housekeeping” chores to keep the main program clear of clutter. 

4. Data control is very important. It can be very frustrating and time consuming to track 
through a long program to find where a variable was changed. First of all, the pro- 

oo O o O O O a E S 
CHAPTER 2: ASSEMBLY LANGUAGE PROGRAMMING 83 


grammer should document the purpose of each variable, and which subroutines might 
alter its value. Further, each subrou- 
tine should document its input and 
output variables, and which input 
variables might be altered within it. 


Flowcharts 


If you have taken any previous 
programming courses, you are probably 
familiar with flowcharting. Flowcharts 
use graphic symbols to represent differ- 
ent types of program operations. These 
symbols are connected together into a 
flowchart to show the flow of execution 
of the program. Figure 2-14 shows some 
of the more commonly used symbols. 
Flowchart templates are available to 
help you draw the symbols quickly and 
neatly. 


Pseudocode 


Flowcharting has been standard 
practice in industry for decades. 
However, some programs find limita- 
tions in using flowcharts, such as the 
fact that you can’t write much in the lit- 
tle boxes, and it is hard to get the “big 
picture” of what the program does with- 
out getting bogged down in the details. 
An alternative to using flowcharts is a) 
pseudocode, which involves writing 
brief descriptions of the flow of the Figure 2-14. Commonly Used Flowchart 
code. Figures 2-15 through 2-19 show Symbols 
flowcharts and pseudocode for common- 
ly used control structures. Then we show 
an example of flowcharts and pseudocode for selected programs in this chapter. 


Connector 


Control structures 


Structured programming uses three basic types of program control structures: 
sequence, control, and iteration. Sequence is simply executing instructions one after the 
other. Figure 2-15 shows how sequence can be represented in pseudocode and flowcharts. 
Figures 2-16 and 2-17 show two control programming structures: IF-THEN and IF- 
THEN-ELSE in both pseudocode and flowcharts. Note in all of these figures that “state- 
ment” can indicate one statement or a group of statements. For example, a statement might 
be “Compute net pay.” Obviously it would require many lines of code to complete this 
“statement.” 


Figures 2-18 and 2-19 show two iteration control structures: REPEAT-UNTIL and 
WHILE-DO. Both structures execute a statement or group of statements repeatedly. The 
difference between them is that the REPEAT-UNTIL structure always executes the state- 
ment(s) at least once, and checks the condition after each iteration, whereas the WHILE- 
DO may not execute the statement(s) at all since the condition is checked at the beginning 
of each iteration. 

Figure 2-20 shows a flowchart versus pseudocode for Program 2-1. In this exam- 
ple, more program details are given than one usually finds. For example, this flowchart 
shows steps for initializing and decrementing counters. Another programmer may not 


Ee 
84 


include these steps in the flowchart or pseudocode. Notice that housekeeping chores such 
as initializing the data segment register in the MAIN procedure are not included in the 
flowchart or pseudocode. 

Code specific to a certain language or operating platform is not described in the 
pseudocode or flowchart. Ideally, one 
could take a flowchart or pseudocode for 
a given program and then code the pro- 
gram in any language. For example, if a 
company needed to have both a C and 


Statement | 


Statement | BASIC language version of the same 
Statement 2 program, these two programs would have 
the same flowchart or pseudocode, even 

Statement 2 though the actual program code would 


look quite different. 

It is important to remember that 
the purpose of flowcharts or pseudocode 
is to show the flow of the program and 
what the program does, not the specific 
Assembly language instructions that 


Figure 2-15. SEQUENCE Pseudocode vs. Flowchart 


Condition 


IF (condition) THEN 
Statement 1 


BSE 


Figure 2-16. IF-THEN-ELSE Pseudocode vs. Flowchart 


Condition 


IF (condition) THEN 
Statement 1 


CHAPTER 2: ASSEMBLY LANGUAGE PROGRAMMING 85 


Figure 2-17. IF-THEN Pseudocode vs. Flowchart 


accomplish the program’s objectives. 

Notice also that pseudocode gives the same information in a much more compact 
form compared to the flowchart. Often pseudocode is written in layers, in a top-down 
manner, so that the top level shows the flow of the program and subsequent levels show 
more details of how the program accomplishes its assigned tasks. 


REPEAT 
Statement 1 
UNTIL (condition) 


Figure 2-18. REPEAT-UNTIL Pseudocode vs. Flowchart 


Conditi 
WHILE (condition) DO ondition 


Statement 1 ? 


Statement 1 


Figure 2-19. WHILE-DO Pseudocode vs. Flowchart 
Review Questions 


1. List four principles of structure programming. 
(flowcharts, pseudocode) use(s) graphic symbols to illustrate program 
flow. 
3. Name two control programming structures. 
4. True or false. The WHILE-DO control structure always executes the statement(s) at 
least once. 


ee a a 
86 


Count = 5 
Count = 5 
Repeat 
Add next byte 
Increment pointer 
Decrement count 
Until Count = 0 


Add one byte 


—— 
Increment 
pointer 
Decrement 
counter 
Store 
SUM 


Figure 2-20. Flowchart vs. Pseudocode for Program 2-1 


Store SUM 


t 
I 
1 
' 
1 
L 
1 
iY 
1 
1 
L] 
i 
t 
1 
1 
1 
I 
1 
' 
i 
I 
I 
1 
i] 
1 
1 
1 
I 
r 
i] 
I 
I 
1 
I 
1 
I 
L] 
i] 
1 
I 
1 
1 
i 
i] 
I 
1 
1 
I 
I 
1 
i] 
I 
1 
1 
I 
1 
I 
i 
i 
I 
1 
1 
i 
i 
i] 
1 
I 
1 
I 
I 
i] 
1 
I 
1 
i) 
i 
i 
1 
i 


PROBLEMS 


SECTION 2.1: DIRECTIVES AND A SAMPLE PROGRAM 
SECTION 2.2: ASSEMBLE, LINK, AND RUN A PROGRAM 
SECTION 2.3: MORE SAMPLE PROGRAMS 


Rewrite Program 2-3 to transfer one word at a time instead of one byte. 

List the steps in getting a ready-to-run program. 

Which program produces the ".exe" file? 

Which program produces the ".obj" file? 

True or false: The ".Ist" file is produced by the assembler regardless of whether or not 

the programmer wants it. 

6. The source program file must have the ".asm" extension in some assemblers such as 
MASM. Is this true for the assembler you are using? 

7. Circle one: The linking process comes (after, before) assembling. 


SECTION 2.4: CONTROL TRANSFER INSTRUCTIONS 


vew Ne 


8. In some applications it is common practice to save all registers at the beginning of a 
subroutine. Assume that SP = 1288H before a subroutine CALL. Show the contents 
of the stack pointer and the exact memory contents of the stack after PUSHF for the 
following: 

1132:0450 CALL PROC1 
reer rere ee er —————————— 


CHAPTER 2: ASSEMBLY LANGUAGE PROGRAMMING 87 


NC BX 
ROC1 PROC 
USH AX 
USH BX 


1132:0453 


cag 

UUN 

mo 
oO 
>< 


ag) Ine) Ing) lag! neiaie 


WSR DUE 
PUSHF 


PROC1 ENDP 

9. To restore the original information inside the CPU at the end of a CALL to a subrou- 
tine, the sequence of POP instructions must follow a certain order. Write the sequence 
of POP instructions that will restore the information in Problem 8. At each point, show 
the contents of the SP. 

10. When a CALL is executed, how does the CPU know where to return? 

11. Ina FAR CALL, and are saved on the stack, 
whereas in a NEAR CALL, is saved on the stack. 

12. Compare the number of bytes of stack taken due to NEAR and FAR CALLs. 

13. Find the contents of the stack and stack pointer after execution of the CALL instruc- 
tion shown below. SUM is a near procedure. Assume the value SS:1296 right before 
the execution of CALL. 

CS iP 
2450:673A CALL SUM 
2450:673D DEC AH 

14. The following is a section of BIOS of the IBM PC which is described in detail in 
Chapter 3. All the jumps below are short jumps, meaning that the labels are in the 
range —128 to +127. 


IP Code 

E 733F JNC ERRORI 
E072 7139 JNO ERRORI 
E08C 8ED8 C8: MOV DS,AX 
EOA7 EBE3 JMP C8 
EOAD F4 Ma ERROR SER 


Verify the address calculations of: 
(a) JNC ERRORI (b) JNO ERRORI (c) JMP C8 


SECTION 2.5: DATA TYPES AND DATA DEFINITION 


15. Find the precise offset location of each ASCII character or data in the following: 


ORG 20H 
DATA1 DB UN SSOOS 5 SS LAs 
ORG 40H 
DATA2 DB "Name: John Jones' 
ORG 60H 
DATA3 DB USOGA ZE 
ORG 70H 
DATA4 DW 2560H, 1000000000110B 
DATA5 DW 49 
ORG 80H 
DATA6 DD 25697F6EH 
DATA7 DQ 9E7BA2Z1C99F2H 
ORG 90H 
DATA8 DT 439997924999828 
DATA9 DB 6 DUP (OEEH) 


16. The following program contains some errors. Fix the errors and make the program 
run correctly. Verify it through the DEBUG program. This program adds four words 


a eS eee ee ae 
88 


and saves the result. 
TITLE PROBLEM (EXE) PROBLEM 16 PROGRAM 
PAGE 60, 132 
-MODEL SMALL 


SSWAEK 2 

DATA 

DATA DW 234DH, DE6H, 3BC7H, 566AH 
ORG 10H 

SUM DW ? 

CODE 


START: PROC FAR 


MOV CX, 04 7SET UP LOOP COUNTER CX=4 
MOV BX, 0 PITINGESEIVAMIIE AD, IBC TOTZERO 
MOV Dit ORS EL DATA 7SET UP DATA POINTER BX 
LOOP1:ADD BX DT] 7ADD CONTENTS POINTED AT BY [ DI] TO BX 
INC DI ; INCREMENT DI 
JNZ TOOPI ; JUMP IF COUNTER NOT ZERO 
MOV SI,OFFSET RESULT ;LOAD POINTER FOR RESULT 
MOV i SI) ples jp STORE THE SUM 
MOV AH, 4CH 
INT AU 
START ENDP 


END STRT 


ANSWERS TO REVIEW QUESTIONS 


SECTION 2.1: DIRECTIVES AND A SAMPLE PROGRAM 


1. Pseudo-instructions direct the assembler as to how to assemble the program. 
2. Instructions, pseudo-instructions or directives 
Se -MODEL SMALL 


.STACK 64 
. DATA 
HIGH DAT DB 95 
CODE 
START PROC FAR 
MOV AX, @DATA 
MOV DS,AX 
MOV A HIGH DAT 
MOV BH, AH 
MOV DL, BH 
MOV AH, 4CH 
ONG ZA 
START ENDP 
END START 
4. (1) There is no ENORMOUS model. 
2) ENDP label does not match label for PROC directive. 
3) .CODE and .DATA directives need to be switched. 
4) "MOV AX,DATA" should be "MOV AX,@DATA". 
5) "MOV DS,@DATA" should be "MOV DS,AX". 
6) END must have the entry point label "MAIN". 


SECTION 2.2: ASSEMBLE, LINK, AND RUN A PROGRAM 


1. (a) MASM must have the ".asm" file as input. 
b) LINK must have the ".obj" file as input. 

2. Editor outputs : (b) .asm 
Assembler outputs: (a) .obj, (d) .Ist, and (e) .crf files 
Linker outputs: (c) .exe and (f) .map files 


SECTION 2.3: MORE SAMPLE PROGRAMS 
1. Increments the operand; that is, it causes 1 to be added to the operand. 


Hee aaa aaaaaaaaaaaaaaaaaaacaaaaaaaaaaamaaaaaaaaamaaaaaaiasaad 


CHAPTER 2: ASSEMBLY LANGUAGE PROGRAMMING 89 


Decrements the operand; that is, it causes 1 to be subtracted from the operand. 
A colon is required after labels referring to instructions; colons are not placed 
after labels for directives. 

The first moves the contents of the word beginning at offset DATA1, and the 
second moves the offset address of DATA. 

The first adds the contents of BX to AX, and the second adds the contents of the 
memory location at offset BX. 


SECTION 2.4: CONTROL TRANSFER INSTRUCTIONS 


FAR 
The instruction right below the jump 
IP 


A es 


The machine code for the instruction will take up 1 less byte. 
SHORT, NEAR, FAR 
The contents of CS and IP were stored on the stack when the call executed. 
It restores the contents of CS:IP and returns control to the instruction immediately 
following the CALL. 
a) GET.DATA, invalid because "." is only allowed as the first character 
b) 1 NUM, because the first character cannot be a number 
c) TEST-DATA, because "-" is not allowed 
d) RET, is a reserved word 


SECTION 2.5: DATA TYPES AND DATA DEFINITION 


1. DB 

2. 24 

3. No because of the little endian storage conventions, which will cause the word 
"7204H" to be stored with the lower byte (04) at offset 10H and the upper byte at 
offset 11H; DB allocates each byte as it is defined. 


gaT a SIENA tr 


It is used to assign the offset address. 
If the value is to be changed later, it can be changed in one place instead of at every 
occurrence. 


(a) 4 (b)2 
No 


So Oa 


SECTION 2.6: FULL SEGMENT DEFINITION 


1. Simple 
2. Halse 
3. SEGMENT, ENDS 


SECTION 2.7: FLOWCHARTS AND PSEUDOCODE 
l. Design before coding, use top-down programming, documentation, data control 
2. Flowcharts 


3. JIF-THEN and IF-THEN-ELSE 
4. False 


90 


CHAPTER 3 


ARITHMETIC AND LOGIC 
INSTRUCTIONS 
AND PROGRAMS 


OBJECTIVES 
Upon completion of this chapter, you will be able to: 


>> Demonstrate how 8-bit and 16-bit unsigned numbers are added in the x86 
>> Convert data to any of the forms: ASCII, packed BCD, unpacked BCD 
>> Explain the effect of unsigned arithmetic instructions on the flags 
Code the following Assembly language unsigned arithmetic instructions: 
>> Addition instructions ADD and ADC 
Subtraction instructions SUB and SBB 
>> Multiplication and division instructions MUL and DIV 
Code BCD arithmetic instructions: 
>> DAA, DAS 
>> Code the following Assembly language logic instructions: 
>> AND, OR, and XOR 
Logical shift instructions SHR and SHL 
The compare instruction CMP 
Code bitwise rotation instructions ROR, ROL, RCR, and RCL 
Demonstrate an ability to use all of the above instructions in 
Assembly language programs 
Perform bitwise manipulation using the C language 


In this chapter, most of the arithmetic and logic instructions are discussed and pro- 
gram examples are given to illustrate the application of these instructions. Unsigned num- 
bers are used in this discussion of arithmetic and logic instructions. Signed numbers are 
discussed separately in Chapter 6. In Section 3.1 we examine the addition and subtrac- 
tion of unsigned numbers. The multiplication and division of unsigned numbers are dis- 
cussed in Section 3.2. The logic instructions and programs are covered in Section 3.3. 
Section 3.4 is dedicated to BCD and ASCII data conversion, while the instructions relat- 
ed to rotate and shift operations are examined in Section 3.5. The last section of the chap- 
ter describes bitwise operations in the C language. 


SECTION 3.1: UNSIGNED ADDITION AND SUBTRACTION 


Unsigned numbers are defined as data in which all the bits are used to represent 
data and no bits are set aside for the positive or negative sign. This means that the operand 
can be between 00 and FFH (0 to 255 decimal) for 8-bit data and between 0000 and 
FFFFH (0 to 65535 decimal) for 16-bit data. This section covers the ADD and SUB 
instructions. 


Addition of unsigned numbers 


The form of the ADD instruction is 
ADD destination,source ;destination = destination + source 


The instructions ADD and ADC are used to add two operands. The destination 
operand can be a register or in memory. The source operand can be a register, in memo- 
ry, or immediate. Remember that memory-to-memory operations are never allowed in x86 
Assembly language. The instruction could change any of the ZF, SF, AF, CF, or PF bits of 
the flag register, depending on the operands involved. The effect of the ADD instruction 
on the overflow flag is discussed in Chapter 6 since it is used in signed number operations. 
Look at Example 3-1. 


Example 3-1 
Show how the flag register is affected by 


MOV AL, OF5H 
ADD AL, OBH 


Solution: 
F5H AL Ababal OOL 


OBH T POOUOM TONM] 
100H 0000 0000 


After the addition, the AL register (destination) contains 00 and the flags are as follows: 
CF = 1, since there is a carry out from D7 

SF = 0, the status of D7 of the result 

PF = 1, the number of 1s is zero (zero is an even number) 

AF = 1, there is a carry from D3 to D4 

ZF = 1, the result of the action is zero (for the 8 bits) 


In discussing addition, the following two cases will be examined: 


1. Addition of individual byte and word data 
2. Addition of multibyte data 


92 


Write a program to calculate the total sum of 5 bytes of data. Each byte represents the daily 
wages of a worker. This person does not make more than $255 (FFH) a day. The decimal data is 
as follows: 125, 235, 197, 91, and 48. 


BPEL PROG3-1A (EXE) ADDING 5 BYTES 
PAGE copiez? 

-MODEL SMALL 

-STACK 64 


AX, @DATA 
DS, AX 
CX, COUNT 7;CX is the loop counter 
Si, OFFSET DATA 7SI is the data pointer 
FAX will hold the sum 
;add the next byte to AL 
7if no carry, continue 
;else accumulate carry in AH 
;increment data pointer 
;decrement loop counter 
7if not finished, go add next byte 
SMEOLS Sibi 


Ree) back to OS 


Program 3-1a 


CASE 1: Addition of individual byte and word data 


In Chapter 2 there was a program that added 5 bytes of data. The total sum was 
purposely kept less than FFH, the maximum value an 8-bit register can hold. To calculate 
the total sum of any number of operands, the carry flag should be checked after the addi- 
tion of each operand. Program 3-la uses AH to accumulate carries as the operands are 
added to AL. 


Analysis of Program 3-1a 


These numbers are converted to hex by the assembler as follows: 125 = 7DH, 235 
= 0OEBH, 197 = 0C5H, 91 = 5BH, 48 = 30H. Three iterations of the loop are shown below. 
The tracing of the program is left to the reader as an exercise. 


1. In the first iteration of the loop, 7DH is added to AL with CF = 0 and AH = 00. CX = 
04 and ZF = 0. 

2. In the second iteration of the loop, EBH is added to AL, which results in AL = 68H 
and CF = 1. Since a carry occurred, AH is incremented. CX = 03 and ZF = 0. 

3. In the third iteration, CSH is added to AL, which makes AL = 2DH. Again a carry 
occurred, so AH is incremented again. CX = 02 and ZF = 0. 


This process continues until CX = 00 and the zero flag becomes 1, which will 
cause JNZ to fall through. Then the result will be saved in the word-sized memory set 
aside in the data segment. Although this program works correctly, due to pipelining it is 


Lee eee ee eee ee eee 
CHAPTER 3: ARITHMETIC AND LOGIC INSTRUCTIONS AND PROGRAMS 93 


strongly recommended that the following lines of the program be replaced: 


Replace these lines With these lines __ 

BACK: ADD AL,| SI] BACK: ADD  AL,{ SJ] 
JNC OVER ADC A 00 Eee 1 o AR iE Cri 
INC AH INC SI 


OVER: INC SI 

The "ADC AH, 00" instruction in reality means add 00 + AH + CF and place the 
result in AH. This is much more efficient since the instruction "JNC OVER" has to empty 
the queue of pipelined instructions and fetch the instructions from the OVER target every 
time the carry is zero (CF = 0). 

The addition of many word operands works the same way. Register AX (or CX, 
DX, or BX) could be used as the accumulator and BX (or any general-purpose 16-bit reg- 
ister) for keeping the carries. Program 3-1b is the same as Program 3-1la, rewritten for 
word addition. 


Write a program to calculate the total sum of five words of data. Each data value represents the 
yearly wages of a worker. This person does not make more than $65,555 (FFFFH) a year. The 
decimal data is as follows: 27345, 28521, 29533, 30105, and 32375. 


EAT AE Ite PROG3-1B (EXE) ADDING 5 WORDS 
PAGE S10), 12 

-MODEL SMALL 

-STACK 64 


21345, 28521,.29523, 3010d;8 2375 
0010H 
2 DUP?) 


MAIN PROC 
MOV AX, @DATA 
MOV DS, AX 
MOV CX, COUNT -CX is the loop counter 
MOV SI,OFFSET DATA ¿SI is the data pointer 
MOV AX, 00 JAX will hold the sum 
MOV BX, AX ;BX will hold the carries 
ADD AX,[ ST] ;add the next word to AX 
ADC BX, 0 raddican y ic@) IED« 
INC ‘Sil ;increment data pointer twice 
INC SJ ;to point to next word 
DEC CX decrement loop counter 
JNZ BACK 7if not finished, continue adding 
MOV SUM, AX ¿store the sum 
MOV SUM Prax ;store the carries 
MOV AH, 4CH 
INT 2 Alsi PEO Wack co OS 
ENDP 
END MAIN 


Program 3-1b 


94 


TITLE PROGS—2 
PAGE 60732 
-MODEL SMALL 

- STACK 64 


(EXE) MULTIWORD ADDITION 


DATA1 DQ 548FB9963CE7H 


ORG 0010H 

DATA2 DQ 3FCD4FA23B8DH 
ORG 0020H 

DATA3 DQ 


MAIN PROC 
MOV AX, @DATA 
MOV DS, AX 


Cre -Clean carry before first additien 
MOV SI,OFFSET DATA1 7;SI is pointer for operandi 

MOV DI,OFFSET DATA2 ;DI is pointer for operand2 

MOV BX,OFFSET DATA3 ;BX is pointer for the sum 

MOV CX, 04 *;CX is the loop counter 


;move the first operand to AX 
;add the second operand to AX 


BACK: MOV AX,[ SI 
A 


] 
ADC X,[ DI] 


MOV [ BX] , AX ;store the sum 
INC Syl ;point to next word of operandl 
INC Sit 
INC DI ;point to next word of operand2 
INC DI 
INC BX ;point to next word of sum 
INC BX 
LOOP BACK ;if not finished, continue adding 
MOV AH, 4CH 
INT 218 PEE loevel< co OS 
MAIN ENDP 


END MAIN 


Program 3-2 


CASE 2: Addition of multiword numbers 


Assume a program is needed that will add the total U.S. budget for the last 100 
years or the mass of all the planets in the solar system. In cases like this, the numbers 
being added could be up to 8 bytes wide or more. Since registers are only 16 bits wide (2 
bytes), it is the job of the programmer to write the code to break down these large num- 
bers into smaller chunks to be processed by the CPU. Ifa 16-bit register is used and the 
operand is 8 bytes wide, that would take a total of four iterations. However, if an 8-bit reg- 
ister is used, the same operands would require eight iterations. This obviously takes more 
time for the CPU. This is one reason to have wide registers in the design of the CPU. 
Powerful new CPUs such as the Itanium have registers of 64 bits wide and larger. 


Analysis of Program 3-2 


In writing this program, the first thing to be decided was the directive used for 

coding the data in the data segment. DQ was chosen since it can represent data as large as 

8 bytes wide. The question is: Which add instruction should be used? In the addition of 

multibyte (or multiword) numbers, the ADC instruction is always used since the carry 

must be added to the next-higher byte (or word) in the next iteration. Before executing 

ADC, the carry flag must be cleared (CF = 0) so that in the first iteration, the carry would 

not be added. Clearing the carry flag is achieved by the CLC (clear carry) instruction. 

Three pointers have been used: SI for DATA1, DI for DATA2, and BX for DATA3 where 

the result is saved. There is a new instruction in that program, "LOOP XXXX", which 
i 
CHAPTER 3: ARITHMETIC AND LOGIC INSTRUCTIONS AND PROGRAMS 95 


replaces the often used "DEC CX" and "INZ XXXX". In other words: 


LOOP xxxx ;is equivalent to the following two instructions 
DEC CX 
JNZ XXXX 


When "LOOP xxxx" is executed, CX is decremented automatically, and if CX 
is not 0, the microprocessor will jump to target address xxxx. If CX is 0, the next instruc- 
tion (the one below "LOOP xxxx") is executed. 


Subtraction of unsigned numbers 


SUB dest,source;dest = dest - source 

In subtraction, the x86 microprocessors (indeed, almost all modern CPUs) use the 
2's complement method. Although every CPU contains adder circuitry, it would be too 
cumbersome (and take too many transistors) to design separate subtractor circuitry. For 
this reason, the x86 uses internal adder circuitry to perform the subtraction command. 
Assuming that the x86 is executing simple subtract instructions, one can summarize the 
steps of the hardware of the CPU in executing the SUB instruction for unsigned numbers 
as follows. 

1. Take the 2's complement of the subtrahend (source operand). 
2. Add it to the minuend (destination operand). 
3. Invert the carry. 

These three steps are performed for every SUB instruction by the internal hard- 
ware of the x86 CPU regardless of the source and destination of the operands as long as 
the addressing mode is supported. It is after these three steps that the result is obtained and 
the flags are set. Example 3-2 illustrates the three steps. 


Example 3-2 


Show the steps involved in the following: 
MOV AL, 3FH ; load AL=3FH 
MOV BARZOH ;load BH=23H 
SUB AL, BH Subtract BH from AL. Place result in AL. 
Solution: 
AL 3F (aL aL aval ak al Dogn TL aaah 
-BH -23 = (OC)AL0) ()OlaL al +1101 1101 (2's complement) 
IG 1 0001 1100 CF=0 (step 3) 


The flags would be set as follows: CF = 0, ZF = 0, AF = 0, PF = 0, and SF = 0. 
The programmer must look at the carry flag (not the sign flag) to determine if the result is pos- 
itive or negative. 


After the execution of SUB, if CF = 0, the result is positive; if CF = 1, the result 
is negative and the destination has the 2's complement of the result. Normally, the result 
is left in 2's complement, but the NOT and INC instructions can be used to change it. The 
NOT instruction performs the 1's complement of the operand; then the operand is incre- 
mented to get the 2's complement. See Example 3-3. 


S86 (subtract with borrow) 


This instruction is used for multibyte (multiword) numbers and will take care of 
the borrow of the lower operand. If the carry flag is 0, SBB works like SUB. If the carry 
flag is 1, SBB subtracts 1 from the result. Notice the "PTR" operand in Example 3-4. The 
PTR (pointer) data specifier directive is widely used to specify the size of the operand 
when it differs from the defined size. In Example 3-4, "WORD PTR" tells the assembler 
to use a word operand, even though the data is defined as a doubleword. 


eee 
96 


Analyze the following program: 

;from the data segment: 

DATA1 DB 4CH 

DATA2 DB ` 6EH 

DATA3 DB P 

;from the code segment: 
MOV DH, DATA1 load DH with DATAT value (4CH) 
SUB DH, DATA2 *Subtract DATA2 (6E) from DH (4CH) 
JNC ;if CF=0 jump to NEXT target 
NOT ;if CF=1 then take 1's complement 

INC ;and increment to get 2's complement 

NEXT: MOV DATA3, DH ;save DH in DATAS 


Solution: 


Following the three steps for "SUB DH,DATA2": 
4C ILOLO OO 0100 1100 
ZOE OILO AO) ae AOO OKONO (2's complement) 


S22 01101 1110 CF=1 (step 3)result is negative 


Example 3-4 


Analyze the following program: 
DATA_A DD 62562FAH 
DATA B 412963BH 
RESULT ? 


AX,WORD PTR DATA A ;AX=62FA 

AX,WORD PTR DATA B SUB 963B from AX 
WORD PTR RESULT, AX ¿save the result 
AX,WORD PTR DATA A +2 7; AX=0625 

AX,WORD PTR DATA B +2 7;SUB 0412 with borrow 
WORD PTR RESULT+2,AX 7Save the result 


Solution: 
After the SUB, AX = 62FA — 963B = CCBF and the carry flag is set. Since CF = 1, when SBB 
is executed, AX = 625 — 412 — 1 = 212. Therefore, the value stored in RESULT is 0212CCBF. 


Review Questions 


1. The ADD instruction that has the syntax "ADD destination, source" replaces the 
operand with the sum of the two operands. 
2. Why is the instruction "ADD DATA_1,DATA_2" illegal? 
3. Rewrite the instruction above in a correct form. 
4. The ADC instruction that has the syntax "ADC destination, source" replaces the 
operand with the sum of 
5. The execution of part (a) below results in ZF = 1, whereas the execution of part (b) 
results in ZF = 0. Explain why. 
(a) MOV BL,04FH (b) MOV BX,04FH 
ADD BL,0B1H ADD BX,0B1H 
The instruction "LOOP ADD LOOP" is equivalent to what two instructions? 
Show how the CPU would subtract 05H from 43H. 
If CF = 1, AL = 95, and BL = 4F prior to the execution of "SBB AL,BL", what will 
be the contents of AL after the subtraction? 


Ça eee 


p eee ee ee ee eee 
CHAPTER 3: ARITHMETIC AND LOGIC INSTRUCTIONS AND PROGRAMS 97 


SECTION 3.2: UNSIGNED MULTIPLICATION AND DIVISION 


One of the maior changes from the 8080/85 microprocessor to the 8086 was inclu- 
sion of instructions for multiplication and division. In this section we cover each one with 
examples. This is multiplication and division of unsigned numbers. Signed numbers are 
treated in Chapter 6. 

In multiplying or dividing two numbers in the x86 microprocessor, the use of reg- 
isters AX, AL, AH, and DX is necessary since these functions assume the use of those reg- 
isters. 


Multiplication of unsigned numbers 


In discussing multiplication, the following cases will be examined: (1) byte times 
byte, (2) word times word, and (3) byte times word. 

byte x byte: In byte-by-byte multiplication, one of the operands must be in the 
AL register and the second operand can be either in a register or in memory as addressed 
by one of the addressing modes discussed in Chapter 1. After the multiplication, the result 
is in AX. See the following example: 


RESULT DW 2 ;result is defined in the data segment 
MOV AL,25H ja byte is moved to AL 
MOV BL, 65H ;immediate data must be in a register 
MUL BL Aline =E X 69H 


MOV RESULT,AX ;the result is saved 


In the program above, 25H is multiplied by 65H and the result is saved in word- 
sized memory named RESULT. In that example, the register addressing mode was used. 
The next three examples show the register, direct, and register indirect addressing modes. 
;from the data segment: 


DATA1 DB 25H 
DATA2 DB 65H 
RESULT DW fs 


;from the code segment: 
MOV AL, DATA1 
MOV BL, DATA2 
MUL BL ;register addressing mode 
MOV RESULT, AX 


Or 
MOV AL, DATAIL 
MUL DATA2 ;direct addressing mode 
MOV RESULT, AX 

or 


MOV AL, DATA1 

MOV SI,OFFSET DATA2 

MUL EMI PER [ ST] ;register indirect addressing mode 
MOV RESULT, AX 


In the register addressing mode example, any 8-bit register could have been used 
in place of BL. Similarly, in the register indirect example, BX or DI could have been used 
as pointers. If the register indirect addressing mode is used, the operand size must be spec- 
ified with the help of the PTR pseudo-instruction. In the absence of the "BYTE PTR" direc- 
tive in the example above, the assembler could not figure out if it should use a byte or 
word operand pointed at by SI. This confusion would cause an error. 

word x word: In word-by-word multiplication, one operand must be in AX and 
the second operand can be in a register or memory. After the multiplication, registers AX 
and DX will contain the result. Since word-by-word multiplication can produce a 32-bit 
result, AX will hold the lower word and DX the higher word. Example: 


EEE 
98 


DATA3 DW 2373H 


DATA4 DW LIEU Del 
RESULT1 DW 2 DUP?) 
MOV AX, DATA3 ;load first operand into AX 
MUL DATA4 ;multiply it by the second operand 


MOV RESULT1,AX ¿store the lower word result 
MOV RESULT1+2,DX ;store the higher word result 


word x byte: This is similar to word-by-word multiplication except that AL con- 
tains the byte operand and AH must be set to zero. Example: 


;from the data segment: 


DATA5 DB 6BH 

DATA6 DW 12C3H 

RESULT DW 2 DUP(?) 

;from the code segment: 
MOV AL, DATAS5 JAL holds byte operand 
SUB  AH,AH 7;AH must be cleared 
MUL DATA6 ;byte in AL mult. by word operand 
MOY BX, ,OFPSET RESULTS ;BX points to product 
MOV [BX] ,AX ;AX holds lower word 
MOV | BX] +2, DX 7;DX holds higher word 

Table 3-1: Unsigned Multiplication Summary 

Multiplication Operand 1 Operand 2 Result 

byte x byte AL register or memory AX 

word x word AX register or memo DX AX 


word x byte AL = byte, AH =0 __ register or memory DX AX 


Table 3-1 gives a summary of multiplication of unsigned numbers. 
Division of unsigned numbers 


In the division of unsigned numbers, the following cases are discussed: 


Byte over byte 

Word over word 

Word over byte 
Doubleword over word 


iad ed a 


In divide, there could be cases where the CPU cannot perform the division. In 
these cases an interrupt is activated. This is referred to as an exception. In what situation 
can the microprocessor not handle the division and must call an interrupt? They are 


1. if the denominator is zero (dividing any number by 00), and 
2. if the quotient is too large for the assigned register. 


In the IBM PC and compatibles, if either of these cases happens, the PC will dis- 
play the "divide error" message. 


byte/byte: In dividing a byte by a byte, the numerator must be in the AL register 
and AH must be set to zero. The denominator cannot be immediate but can be in a regis- 
ter or memory as supported by the addressing modes. After the DIV instruction is per- 
formed, the quotient is in AL and the remainder is in AH. The following shows the vari- 
ous addressing modes that the denominator can take. 
DATA? DB 95 
DATA8 DB 10 


p 


CHAPTER 3: ARITHMETIC AND LOGIC INSTRUCTIONS AND PROGRAMS 99 


Table 3-2: Unsigned Division Summary 


Division Numerator Denominator Quotient Rem. 
plied erin. 


byte/byte AL = byte, AH = 0 register or memory AL! AH 
word/word AX = word, DX = 0 register or memory AX2 DX 


word/byte AX = word register or memory AL! AH 


doubleword/word DXAX = doubleword register or memory AX2 DX 
Notes: 

1. Divide error interrupt if AL > FFH. 

2. Divide error interrupt if AX > FFFFH. 

QOUT1 DB 2 

REMAINL DB 2 


jusing immediate addressing mode will give an error 


MOV AL, DATA7 ;move data into AL 
SUB  AH,AH ;clear AH 
DIV ALG, ;immed. mode not allowed!! 


¿allowable modes include: 
¿using direct mode 


MOV AL, DATA7 ¿AL holds numerator 

SUB AH,AH ;AH must be cleared i 

DIV DATA8 ;divide AX by DATA8 

MOV OQOUT1,AL ;quotient = AL = 09 

MOV REMAIN1,AH ;remainder = AH = 05 
¿using register addressing mode 

MOV AL, DATA7 ;AL holds numerator 

SUB AH, AH ;AH must be cleared 

MOV BH, DATA8 ;move denom. to register 

IDE Jaa! ;divide AX by BH 

MOV QOUT1,AL ;quotient = AL = 09 

MOV REMAINI1,AH ;remainder = AH = 05 
;using register indirect addressing mode 

MOV AL, DATA7 AL holds numerator 

SUB AH,AH 7AH must be cleared 

MOV BX,OFFSET DATA8 ;BX holds offset of DATA8 

DIV BYTE MLR’ || J8).4 ;divide AX by DATA8 


MOV QOUT2,AX 
MOV REMAIND2, DX 


word/word: In this case the numerator is in AX and DX must be cleared. The 
denominator can be in a register or memory. After the DIV, AX will have the quotient and 
the remainder will be in DX. 


MOV AX,10050 ;AX holds numerator 

SUE — IDPC DX 7;DX must be cleared 

MOV BX,100 ;BX used for denominator 
DIV JB): 

MOV QOUT2,AX ;quotient = AX = 64H = 100 


MOV REMAIND2,DX ;remainder = DX = 32H = 50 


word/byte: Again, the numerator is in AX and the denominator can be in a reg- 
ister or memory. After the DIV instruction, AL will contain the quotient, and AH will con- 
tain the remainder. The maximum quotient is FFH. The following program divides AX = 
2055 by CL = 100. Then AL = 14H (20 decimal) is the quotient and AH = 37H (55 deci- 
mal) is the remainder. 


MOV + TAX a055 ;AX holds numerator 

MOVI Ge OO ;CL used for denominator 
IDA (ly 

MOV QUO,AL ;AL holds quotient 

MOV REMI,AH ;AH holds remainder 


100 


doubleword/word: The numerator is in AX and DX, with the most significant 
word in DX and the least significant word in AX. The denominator can be in a register 
or in memory. After the DIV instruction, the quotient will be in AX, and the remainder in 
DX. The maximum quotient is FFFFH. 


;from the data segment: 


DATA1 DD ISAS 2 

DATA2 DW 10000 

QUOT DW g 

REMAIN DW i 

;from the code segment: 
MOV AX,WORD PTR DATAI1 ;AX holds lower word 
MOV DX,WORD PTR DATA1+2;DX higher word of numerator 
DIV DATA2 
MOV QUOT,AX ;AX holds quotient 
MOV REMAIN, DX ;DX holds remainder 


In the program above, the contents of DX:AX are divided by a word-sized data 
value, 10000. Now one might ask: How does the CPU know that it must use the double- 
word in DX:AX for the numerator? The 8088/86 automatically uses DX:AX as the numer- 
ator anytime the denominator is a word in size, as was seen earlier in the case of a word 
divided by a word. This explains why DX had to be cleared in that case. Notice in the 
example above that DATA 1 is defined as DD but fetched into a word-size register with the 
help of WORD PTR. In the absence of WORD PTR, the assembler will generate an error. 
A summary of the results of division of unsigned numbers is given in Table 3-2. 


Review Questions 
1. In unsigned multiplication of a byte in DATA1 with a byte in AL, the product will be 


placed in register(s) 
2. In unsigned multiplication of AX with BX, the product is placed in register(s) 


3. In unsigned multiplication of CX with a byte in AL, the product is placed in register(s) 
4. In unsigned division of a byte in AL by a byte in DH, the quotient will be placed in 


and the remainder in : 
5. In unsigned division of a word in AX by a word in DATA1, the quotient will be placed 


in and the remainder in _ 

6. In unsigned division of a word in AX by a byte in DATA2, the quotient will be placed 
in and the remainder in ; 

7. In unsigned division of a doubleword in DXAX by a word in CX, the quotient will be 
placed in and the remainder in ; 


p 


CHAPTER 3: ARITHMETIC AND LOGIC INSTRUCTIONS AND PROGRAMS 101 


SECTION 3.3: LOGIC INSTRUCTIONS 


In this section we discuss the logic instructions AND, OR, XOR, SHIFT, and 


COMPARE in the context of many examples. 
AND 


AND destination, source 

This instruction will perform a logical AND on the operands 
and place the result in the destination. The destination operand can be 
a register or in memory. The source operand can be a register, in mem- 
ory, or immediate. 

AND will automatically change the CF and OF to zero, and 
PF, ZF, and SF are set according to the result. The rest of the flags are 
either undecided or unaffected. As seen in Example 3-5, AND can be 
used to mask certain bits of the operand. It can also be used to test for 
a zero operand: 


AND DH,DH 
Ja KKA 


XXXX: 


Logical AND Function 


Inputs Output __ 
XxX Y XANÐY 
0O 0 


0 
0 1 0 
1 0 0 
l Îl l 


The above code will AND DH with itself and set ZF = 1 if the result is zero, mak- 
ing the CPU fetch from the target address XXXX. Otherwise, the instruction below JZ is 


executed. AND can thus be used to test if a register contains zero. 
Example 3-5 


Show the results of the following: 
MOV EBE, 35H 


AND BL, OFH ;AND BL with OFH. 


Solution: 


Speye! 
OFH 
05H 


OR 


OR destination, source 


Place the result in BL. 


Flag settings will be: SF = 0, ZF = 0, PF = 1, CF = OF = 0. 


The destination and source operands are ORed and the result is placed in the des- 


tination. OR can be used to set certain bits of an operand to 1. The 
destination operand can be a register or in memory. The source 
operand can be a register, in memory, or immediate. 

The flags will be set the same as for the AND instruction. CF 
and OF will be reset to zero, and SF, ZF, and PF will be set according 
to the result. All other flags are not affected. See Example 3-6. 

The OR instruction can also be used to test for a zero 
operand. For example, "OR BL, 0" will OR the register BL with 0 and 
make ZF = 1 if BL is zero. "OR BL, BL" will achieve the same result. 


Logical OR Function 
Inputs Output 


X Y XORY 
0 0 

0 1 1 

I 0 1 

1 l 1 


< 
| 
> 
© 
vs) 
< 


102 


Example 3-6 


Show the results of the following: 
MOV AX,0504 TAX 
OR AX,0DA68H ;AX 


Solution: 
0504H 0000 0101 0000 0100 
DAG8H 1101 1010 0110 1000 Flags will be: SF = 1 , ZF = 0, PF = 1, CF = OF = 0. 
DF6C 1101 1111 0110 1100 Notice that parity is checked for the lower 8 bits only. 


XOR 


Logical XOR Function 
XOR dest,src 
The XOR instruction will eXclusive-OR the Inputs Output 


operands and place the result in the destination. XOR sets 


the result bits to 1 if they are not equal; otherwise, they A B A XOR B 
are reset to 0. The flags are set the same as for the AND 0 0 0 
instruction. CF = 0 and OF = 0 are set internally and the 0 1 1 
rest are changed according to the result of the operation. ] 0 1 
The rules for the operands are the same as in the AND and 1 1 0 


OR instructions. See Examples 3-7 and 3-8. ——— a a oO 
XOR can also be used to see if two registers have A > > AXOR B 

the same value. "XOR BX,CX" will make ZF = 1 if both B 

registers have the same value, and if they do, the result 

(0000) is saved in BX, the destination. 


Another widely used application of XOR is to toggle bits of an operand. For 
example, to toggle bit 2 of register AL: 


XOR AL,04H XOR AL with 0000 0100 


This would cause bit 2 of AL to change to the opposite value; all other bits would 
remain unchanged. 
Example 3-7 


Show the results of the following: 
MOV DH, 54H 
XOR DH, 78H 


Solutio 


n: 
0 
0 
O 


1 
1 
0 Flag settings will be: SF = 0, ZF = 0, PF = 0, CF = OF = 0. 


Example 3-8 


The XOR instruction can be used to clear the contents of a register by XORing it with itself. 
Show how "XOR AH,AH" clears AH, assuming that AH = 45H. 


Solution: 
45H 01000101 
45H 01000101 
00 00000000 Pigeseqa se willbe: SE = 0, ZF- 1, PF =1 , CF -OF =0. 


p 


CHAPTER 3: ARITHMETIC AND LOGIC INSTRUCTIONS AND PROGRAMS 103 


SHIFT 


There are two kinds of shifts: logical and arithmetic. The logical shift is for 
unsigned operands, and the arithmetic shift is for signed operands. Logical shift will be 
discussed in this section and the discussion of arithmetic shift is postponed to Chapter 6. 
Using shift instructions shifts the contents of a register or memory location right or left. 
The number of times (or bits) that the operand is shifted can be specified directly if it is 
once only, or through the CL register if it is more than once. 

SHR: This is the logical shift right. The 
operand is shifted right bit by bit, and for every 
shift the LSB (least significant bit) will go to the 07> — 
carry flag (CF) and the MSB (most significant bit) , 
is filled with 0. Examples 3-9 and 3-10 should 
help to clarify SHR. 


Example 3-9 


Show the result of SHR in the following: 
MOV AL, 9AH 
MOV CLs ¿set number of times to shift 
SHR iAL,CL 
Solution: 
9AH = 10011010 
01001101 CF = 0 (shifted once) 
00100110 CF = 1 (shifted twice) 
00010011 CF = 0 (shifted three times) 
After shifting right three times, AL = 13H and CF = 0. 


If the operand is to be shifted once only, this is specified in the SHR instruction 
itself rather than placing 1 in the CL. This saves coding of one instruction: 


MOV BX, OF FFFH ; BX=FFFFH 
SHR IBC dL Pontet rogne BX onceronly 


After the shift above, BX = 7FFFH and CF = 1. Although SHR does affect the OF, 
SF, PF, and ZF flags, they are not important in this case. The operand to be shifted can be 
in a register or in memory, but immediate addressing mode is not allowed for shift instruc- 
tions. For example, "SHR 25, CL" will cause the assembler to give an error. 


Example 3-10 


Show the results of SHR in the following: 
;from the data segment: 
DATA1 DW TORI Ts! 
;from the code segment: 
TIMES EQU 4 
MOV Cli, DIMES 7; CL=04 
SHR DATA1, CL Shift DATA] CL times 


Solution: 


After the four shifts, the word at memory location DATA will contain 0777. The four LSBs are 
lost through the carry, one by one, and Os fill the four MSBs. 


104 


SHL: Shift left is also a logical 
shift. It is the reverse of SHR. After every 
shift, the LSB is filled with 0 and the — -| MSB =—isB_ — 
SB 0 
MSB goes to CF. All the rules are the =a 


same as for SHR. 


Example 3-11 


Show the effects of SHL in the following: 
MOV DH, 6 
KOMI eh 
SHL DH,CL 


Solution: 


00000110 
CF=0 00001100 (shifted left once) 


CF=0 00011000 

CF=0 00110000 

CF=0 01100000 (shifted four times) 
After the four shifts left, the DH register has 60H and CF = 0. 


Example 3-11 could have been coded as 


MOV DH, 6 
SHL DEL 
SHL PAP db 
SHL DEW, 1 
SHL DREPI 


COMPARE of unsigned numbers 


CMP destination,source ;compare dest and src 


The CMP instruction compares two operands and changes the flags according to 
the result of the comparison. The operands themselves remain unchanged. The destination 
operand can be in a register or in memory and the source operand can be in a register, in 
memory, or immediate. Although all the CF, AF, SF, PF, ZF, and OF flags reflect the result 
of the comparison, only CF and ZF are used, as outlined in Table 3-3. 


Table 3-3: Flag Settings for Compare Instruction 


Compare operands CF ZF 


destination > source 
destination = source 
destination < source 


| |] |) 
oT oO 


The following demonstrates how the CMP instruction is used: 


DATA1 DW 295 BH 
MOV AX, OCCCCH 
CMP AX, DATA1 peempece CCCC wrth 2359F 
JNC OVER ¿jump if CF=0 


SUB AX, AX 
OVER: INC DATA1 
p 


CHAPTER 3: ARITHMETIC AND LOGIC INSTRUCTIONS AND PROGRAMS 105 


In the program above, AX is greater than the contents of memory location DATA! 
(OCCCCH > 235FH); therefore, CF = 0 and JNC (jump no carry) will go to target OVER. 
In contrast, look at the following: 


MOV BX, 7888H 
MOV Cx, SHEE 
CMP Bx nex ;compare 7888 with 9FFF 
JNC NEXT 
ADD BX, 4000H 
NEXT: ADD Cx 2 50H 


In the above, BX is smaller than CX (7888H < 9FFFH), which sets CF = 1, mak- 
ing "INC NEXT" fall through so that "ADD BX, 4000H" is executed. In the example above, 
CX and BX still have their original values (CX = 9FFFH and BX = 7888H) after the exe- 
cution of "CMP BX, CX". Notice that CF is always checked for cases of greater or smaller 
than, but for equal, ZF must be used. The next program sample has a variable named 
TEMP, which is being checked to see if it has reached 99: 


TEMP DB p 
MOV AL, TEMP ;move the TEMP variable into AL 
CMP AL, 99 ;compare AL with 99 
JZ HOT HOT ;if ZF=1 (TEMP = 99) jump to ROT sneer 
INC BX ;otherwise (ZF=0) increment BX 
HOT HOT: HLT ¿halt the system 


The compare instruction is really a SUBtraction except that the values of the 
operands do not change. The flags are changed according to the execution of SUB. 
Although all the flags are affected, the only ones of interest are ZF and CF. It must be 
emphasized that in CMP instructions, the operands are unaffected regardless of the result 
of the comparison. Only the flags are affected. This is despite the fact that CMP uses the 
SUB operation to set or reset the flags. Program 3-3 uses the CMP instruction to search 
for the highest byte in a series of 5 bytes defined in the data segment. The instruction "CMP 
AL,{ Bx] " works as follows, where [BX] is the contents of the memory location pointed 
at by register BX. 


If AL < [BX], then CF = 1 and [BX] becomes the basis of the new comparison. 
If AL > [BX], then CF = 0 and AL is the larger of the two values and remains the basis 
of comparison. 

Although JC (jump carry) and JNC Gump no carry) check the carry flag and can 
be used after a compare instruction, it is recommended that JA (jump above) and JB (jump 
below) be used for two reasons. One reason is that assemblers will unassemble JC as JB, 
and JNC as JA, which may be confusing to beginning programmers. Another reason is that 
"jump above" and "jump below" are easier to understand than "jump carry" and "jump no 
carry," since it is more immediately apparent that one number is larger than another, than 
whether a carry would be generated if the two numbers were subtracted. 

Program 3-3 searches through five data items to find the highest grade. The pro- 
gram has a variable called “Highest” that holds the highest grade found so far. One by one, 
the grades are compared to Highest. If any of them is higher, that value is placed in 
Highest. This continues until all data items are checked. AREPEAT-UNTIL structure was 
chosen in the program design. Figure 3-1 shows the flowchart for Program 3-3. This 
design could be used to code the program in many different languages. 

Program 3-3 as coded in Assembly language, uses the CMP instruction to search 
through 5 bytes of data to find the highest grade. The program uses register AL to hold the 
highest grade found so far. AL is given the initial value of 0. A loop is used to compare 
each of the 5 bytes with the value in AL. If AL contains a higher value, the loop contin- 
ues to check the next byte. If AL is smaller than the byte being checked, the contents of 
me 


106 


Assume that there is a class of five people with the following grades: 69, 87, 96, 45, and 75. 
Find the highest grade. 


SEIN PROG3-3 (EXE) CMP EXAMPLE 
PAGE S02 

.MODEL SMALL 

. STACK 64 


AX, @DATA 

DS, AX 

Cp a ;set up loop counter 

BX,OFFSET GRADES ;BX points to GRADE data 

AL, AL ;AL holds highest grade found so far 
;compare next grade to highest 
;jump if AL still highest 
;else AL holds new highest 
;point to next grade 
;continue search 

HIGHEST, AL ;store highest grade 

AH, 4CH 

21H PCO brek ico OS 


MAIN 


Program 3-3 


AL are replaced by that byte and the loop continues. 
Program 3-4 uses the CMP instruction to determine if an ASCII character is 
uppercase or lowercase. Note that small and capital letters in ASCII have the following 


values: 

Letter Hex __ Binary Letter Hex _ Binary 

A 41 0100 0001 a 6l 0110 0001 
B 42 0100 0010 b 62 0110 0010 
C 43 0100 0011 c 63 0110 0011 
Y 59 0101 1001 y 79 0111 1001 
Z 5A 0101 1010 Z 7A 0111 1010 


As can be seen, there is a relationship between the pattern of lowercase and upper- 
case letters, as shown below for A and a: 


A 0100 0001 41H 
a 0110 0001 61H 


The only bit that changes is d5. To change from lowercase to uppercase, d5 must 
be masked. Program 3-4 first detects if the letter is in lowercase, and if it is, it is ANDed 
with 1101 1111B = DFH. Otherwise, it is simply left alone. To determine if it is a lower- 
case letter, it is compared with 61H and 7AH to see if it is in the range a to z. Anything 
above or below this range should be left alone. 


eres e reer ree eee eee re 
CHAPTER 3: ARITHMETIC AND LOGIC INSTRUCTIONS AND PROGRAMS 107 


Count = 5 


Highest = 0 BOSS 


Highest = 0 


REPEAT 
IF (Next > Highest) 
THEN 
Highest = Next 
ENDIF 


Next > Decrement Count 


yes 
Highest UNTIL Count = 0 
? 
Store Highest 


=a 


Decrement count 
Increment pointer 


yes 
Store Highest 


Figure 3-1. Flowchart and Pseudocode for Program 3-3 


In Program 3-4, 20H could have been subtracted from the lowercase letters 
instead of ANDing with 1101 1111B. That is what IBM did in their BIOS, as shown next. 


IBM BIOS method of converting from lowercase to uppercase 


2357; ----- CONVERT ANY LOWERCASE TO UPPERCASE 

2358 

EBFB 2359 K60: ;LOWER TO UPPER 
EBFB 3C61 2360 CMPAL, ‘a! ;FIND OUT IF ALPHABETIC 
EBFD 7206 2361 JBK61 ;NOT_CAPS STATE 

EBFF 3C7A 2362 CMPAL,'z' 

EC01 7702 2363 JAK61 ;NOT_CAPS STATE 

ECOS 262002364 SUBAL, 'a'-'A!' ;CONVERT TO UPPERCASE 
ECO5 2365 K61: 


108 


PROG3-4 
PAGE 60, 132 
-MODEL SMALL 

-STACK 64 


(EXE) LOWERCASE TO UPPERCASE CONVERSION 


AX, @DATA 
MOV DS, AX 

MOV SI,OFFSET DATA1 *SI points to original data 
MOV BX,OFFSET DATA2 ;BX points to uppercase data 


MOV (sp eA 7;CX is loop counter 
BACK : MOV ATHEST] ;get next character 

CMP AL, 61H ;1f less than `a! 

JB OVER ;then no need to convert 

CMP AL, 7AH ;if greater than ~z' 

JA OVER ;then no need to convert 

AND Anp Om 1 LIB ;mask d5 to convert to uppercase 
OVER: MOV [ BX] , AL ;store uppercase character 

INC Si ;increment pointer to original 

INC BX ;increment pointer to uppercase data 


Continuen looping if ex) > 0 


REO Issel Tee) O'S 


Program 3-4 


Review Questions 


1. Use operands 4FCAH and C237H to perform: 
(a) AND (b) OR (c) XOR 

2. ANDing a word operand with FFFFH will result in what value for the word operand? 
To set all bits of an operand to 0, it should be ANDed with 

3. To set all bits of an operand to 1, it could be ORed with 

4. XORing an operand with itself results in what value for the operand? 

5. Show the steps if value AOF2H were shifted left three times. Then show the steps if 
AOF2H were shifted right three times. 

6. The CMP instruction works by performing a(n) operation on the operands 
and setting the flags accordingly. 

7. True or false. The CMP instruction alters the contents of its operands. 


BIOS examples of logic instructions 


Next we examine some real-life examples from the original IBM PC BIOS pro- 
grams. The purpose is to see the instructions discussed so far in the context of real-life 
applications. 

When the computer is turned on, the CPU starts to execute the programs stored in 
BIOS in order to set the computer up for the OS. If anything has happened to the BIOS 
programs, the computer can do nothing. The first subroutine of BIOS is to test the CPU. 
This involves checking the flag register bit by bit as well as checking all other registers. 
The BIOS program for testing the flags and registers is given on the next page followed 


eee eee reer eee ere eee eee eee ————EEEeEEEEE———EE= 


CHAPTER 3: ARITHMETIC AND LOGIC INSTRUCTIONS AND PROGRAMS 109 


by their explanation: 


EO5B 
EO5B 
E05B 
EO5B 
E05C 
E05E 
EOSF 
E061 
E063 
E065 
E067 
E068 
E06A 
EO6C 
E06E 
E070 
E072 
E074 
F076 
E077 


E079 
E07B 
E07D 
EO7E 
E080 
F082 
E084 
E086 


E088 
E08B 
E08C 
E08C 
EO08E 
E090 
E092 
E094 
E096 
E098 
BEO9A 
HOJE 
E09E 
E0AO 
E0A2 
E044 
E0A6 
E0A7 
E0A9 
E0A9 
EOAB 
EQAD 


B8FFFF 
BO 


8ED8 
8CDB 
8EC3 
SEC 
8ED1 
8CD2 
8BE2 


F4 


306 ASSUME CS:CODE, DS:NOTHING, ES:NOTHING, SS: NOTHING 


307 ORG OEO5BH 


310 CLI 


SLI MOV AH, 0D5H; 


Siz SAHF 

313 JNC ERROL 
314  JNZ ERRO1 
315 JNP ERRO1 
316 JNS ERRO1 
317 LAHF 

318 MOV CL,5 
319 SHR AH,CL 
320 JNC ERRO1 
321 MOV AL,40H 
322 Tomm Am 1 
323 JNO ERRO1 
324 XOR AH,AH 
325 SAHF 

326 JBE ERRO1 


i 


DISABLE INTERRUPTS 


SET SE; CE, ZF, AND AF FLAGS ON 


Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne `o 


GO TO ERR ROUTINE IF CF NOT SET 

GO TO ERR ROUTINE IF ZF NOT SET 
GQ TO ERR TROULINE SEE SPE SNO ise. 
GOTTOTERR TROU TINE TTE GaN Osmo 
LOAD FLAG IMAGE TO AH 

LOAD CNT REG WITH SHIFT CNT 
SHIFT Ak INTO CARRY BIT POS 

GO TO ERR ROUTINE IF AF NOT SET 
SET THE OF FLAG ON 

SETUP FOR TESTING 

GO TO ERR ROUTINE IF OF NOT SET 
SET AH = 0 

CLEAR SE, CH, ZE, AND PF 

GO TO ERR ROUTINE IF CE ON 


SZT ; OR GO TO ERR ROUTINE IF ZF ON 


20 JS ERROL 
32S JP ERRO1 
330 LAHF 

Seal MOW (Cli, S 
2392 SHR AH,CL 
233 JC ERRO1L 
334 SHL AH,1 
Sao JO ERRO1 
336 


A 
F 
r 
7 
, 
r 
r 
r 


GO TO ERR ROUTINE IF SF ON 
GO TO ERR ROUTINE IE PE ON 
LOAD FLAG IMAGE TO AH 

LOAD CNT REG WITH SHIFT CNT 
SHIFT ‘Ar IN@TOSCARRY Biles 
GO TO ERR ROUTINE IF ON 
CHE C KANE Oa Sn Cierra 

GO TO ERR ROUTINE IF ON 


337;--- READ/WRITE THE 888 GENERAL AND SEGMENTATION REGISTERS 


Boo WITH ALL ONES AND ZEROES. 

356 

340 MOV AX, OFFFFH ; SET UP ONE'S PATTERN IN AX 
341 Save 

SAZES 


343 MOV DS,AX 
344 MOV BxX,DS 
345 MOV ES,BX 
346 MOV CX,ES 
347 MOV SS, CX 
348 MOV DX,SS 


356 CLE 
257 JMP C8 
SECES 


359 OR AX,DI 
Se) ais alt 
361ERRO1: HLT 


. 
$ 


. 
r 


. 
E 


WRITE PATTERN TO ALL REGS 


ANSE 
PATTERN MAKE IT THRU ALL REGS 
INO) = (GO) TWO) ISU INOW AP ICING! 


ZERO PATTERN MAKE IT THRU? 
YES - GO TO NEXT TEST 
HALT  S¥s TEM 


(Reprinted by permission from "IBM Technical Rees c. 1984 by International Business Machines Corporation) 


Line-by-line explanation: 


Explanation 


CLI ensures that no interrupt will occur while the test is being 


conducted. 
MOVAH,0DS5H: 


eee 


flag SZ-AC-P-C 
Dsbal to! 0101 
312 SAHF (store AH into lower byte of the flag register) is one way to move 
data to flags. Another is to use the stack: 
MOVAX, 00D5H 
PUSHAX 
POPF 
However, there is no RAM available yet to use for the stack because the 
CPU is tested before memory is tested. 


313-316 Will make the CPU jump to HLT if any flag does not work. 

B17 ae (load AH with the lower byte of flag register) is the opposite of 

318 Loads CL for five shifts. 

Sio "SHR AH,CL". By shifting AH five times, AF (auxiliary carry) will be 
in the CF position. 

320 If no AF, there is an error. Lines 317 to 320 are needed because there is 
no jump condition instruction for AF. 

321-323 Checks the OF flag. This is discussed in Chapter 6 when signed numbers 
are discussed. 

324-335 Checks the same flags for zero. Remember that JNZ is the same as JBE. 

340 Loads AX with FFFFH. 

341 STC (set the carry) makes CF = 1. 

343-352 Moves the AX value (FFFFH) into every register and ends up with DI = 
FFFFH if the registers are good. 

353 Since CF = | (remember STC) it falls through. 

354 Exclusive-ORing AX and DI with both having the same FFFFH value 


makes AX = 0000 and ZF = 1 if the registers are good (see lines 
343-352). If ZF = 0, one of the registers must have corrupted the data 
FFFF, therefore the CPU is bad. 


955 If ZF = 0, there is an error. 
356 CLC clears the carry flag. This is the opposite of STC. 
357 Jumps to C8 and repeats the same process, this time with value 0000. The 


contents of AX are moved around to every register until DI = 0000, and 
at 353 the JNC C9 will jump since CF = 0 by the CLC instruction before 
it went to the loop. 

359 At C9, AX and DI are ORed. If 0000, the contents of AX are copied 
successfully to all registers, and DI will be 0000; therefore, ORing will 
raise the ZF, making ZF = 1. 


360 If ZF = 1, the CPU is good and the system can perform the next test. 
Otherwise, ZF = 0, meaning that the CPU is bad and the system should 
be halted. 


SECTION 3.4: BCD AND ASCII CONVERSION 


This section covers BCD and ASCII conversions with some examples. 
BCD number system 


BCD stands for binary coded decimal. BCD is needed because we use the digits 
0 to 9 for numbers in everyday life. Binary representation of 0 to 
9 is called BCD (see Figure 3-2). In computer literature one 
encounters two terms for BCD numbers: (1) unpacked BCD, and 
(2) packed BCD. 


Unpacked BCD 


In unpacked BCD, the lower 4 bits of the number repre- 
sent the BCD number and the rest of the bits are 0. Example: 
"0000 1001" and "0000 0101" are unpacked BCD for 9 and 5, 
respectively. In the case of unpacked BCD it takes 1 byte of 
memory location or a register of 8 bits to contain the number. 


J 
H- 
H 

ct 


VIgIt 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 


Figure 3-2. BCD Code 
a 


CHAPTER 3: ARITHMETIC AND LOGIC INSTRUCTIONS AND PROGRAMS lli 


Packed BCD 


In the case of packed BCD, a single byte has two BCD numbers in it, one in the 
lower 4 bits and one in the upper 4 bits. For example, "0101 1001" is packed BCD for 59. 
It takes only 1 byte of memory to store the packed BCD operands. This is one reason to 
use packed BCD since it is twice as efficient in storing data. 


ASCII numbers 


In ASCII keyboards, when key "0" is activated, for example, "011 0000" (30H) is 
provided to the computer. In the same way, 31H (011 0001) is provided for key "1", and 
so on, as shown in the following list: 


Key ASCII (hex) Binary BCD (unpacked) 
0) 30 011 0000 0000 0000 
il Sul 011 0001 0000 0001 
2 BE OO ORC 0000 0010 
3 BS 011 0011 0000 0011 
4 34 OL 0100 0000 0100 
5 25 vial OTO: 0000 OO 
6 36 Di OPLO 0000 0110 
7 27 OO 0000 0111 
8 38 OL aL T000 0000 1000 
9 39 OLI ALON} AL 0000 1001 


It must be noted that although ASCII is standard in the United States (and many 
other countries), BCD numbers have universal application. Now since the keyboard, print- 
ers, and monitors are all in ASCII, how does data get converted from ASCII to BCD, and 
vice versa? These are the subjects covered next. 


ASCII to BCD conversion 


To process data in BCD, first the ASCII data provided by the keyboard must be 
converted to BCD. Whether it should be converted to packed or unpacked BCD depends 
on the instructions to be used. There are instructions that require that data be in unpacked 
BCD and there are others that must have packed BCD data to work properly. Each is cov- 
ered separately. 


ASCII to unpacked BCD conversion 


To convert ASCII data to BCD, the programmer must get rid of the tagged "011" 
in the higher 4 bits of the ASCII. To do that, each ASCII number is ANDed with "0000 
1111" (OFH), as shown in the next example. This example is written in three different 
ways using different addressing modes. Programs 3-5a, 3-5b, and 3-5c show three differ- 
ent methods for converting the 10 ASCII digits to unpacked BCD. All use the same data 
segment: 


ASC DB 195624812731 
ORG 0010H 
UNPACK DB LO DUE (2 ) 


In Program 3-5a, notice that although the data was defined as DB, a byte defini- 
tion directive, it was accessed in word-sized chunks. This is a workable approach; how- 
ever, using the PTR directive as shown in Program 3-5b makes the code more readable for 
programmers. 

In both of the solutions so far, registers BX and DI were used as pointers for an 
array of data. An array is simply a set of data located in consecutive memory locations. 
Now one might ask: What happens if there are four, five, or six arrays? How can they all 
be accessed with only three registers as pointers: BX, DI, and SI? Program 3-5c shows 
how this can be done with a single register used as a pointer to access two arrays. 
However, to do that, the arrays must be of the same size and defined similarly. 


eee 
112 


eee 

BX,OFFSET ASC 
DI,OFFSET UNPACK 
AX,[ BX] 

AX, OFOFH 


lL Ie} pp Bx 
DIZ 
BX,2 
AGAIN 


;BX points to ASCII data 

;DI points to unpacked BCD data 
;move next 2 ASCII numbers to AX 
PESMOVer AS Cll lass 

;store unpacked BCD 

;point to next unpacked BCD data 
;point to next ASCII data 


Program 3-5a 


Ges 

BX,OFFSET ASC 
DI,OFFSET UNPACK 
AX,WORD PTR [ BX] 
AX, OFOFH 

WORD PTR [ DI] , AX 
Dine 

BX, 2 

AGAIN 


7,CX us loopmicounter 

Pisp< jokoabalers; wre) INSICIIL ielencel 

-DE points to unpacked BCD data 
;move next 2 ASCII numbers to AX 
;remove ASCII 3s 

¿store unpacked BCD 

;point to next unpacked BCD data 
7point to next ASCII data 


Program 3-5b 


Program 3-Sc uses the based addressing mode since BX+ASC is used as a point- 
er. ASC is the displacement added to BX. Either DI or SI could have been used for this 
purpose. For word-sized operands, "WORD PTR" would be used since the data is defined 


as DB. This is shown below. 


MOV 
AND 
MOV 


Program 3-5c 


ener ee ee eee ene een eee ————————————— 
CHAPTER 3: ARITHMETIC AND LOGIC INSTRUCTIONS AND PROGRAMS 


AX,WORD PTR ASC[ BX] 


AX, OFOFH 


WORD PTR UNPACKED[ BX] , AX 


Ce 10 

BX, BX 

AL, ASCI BX] 
AL, OFH 


UNPACK[ BX] , AL 
BX 
AGAIN 


;load the counter 

;clear BX 

;move to AL content of mem [ BX+ASC] 
;mask the upper nibble 
;move to mem [ BX+UNPACK] 
Pjoxo ine wey Mexe Dyte 

, loos) Unt it is fanmesiied 


the AL 


113 


ASCII to packed BCD conversion 


To convert ASCII to packed BCD, it is first converted to unpacked BCD (to get 
rid of the 3) and then combined to make packed BCD. For example, for 9 and 5 the key- 
board gives 39 and 35, respectively. The goal is to produce 95H or "1001 0101", which is 
called packed BCD, as discussed earlier. This process is illustrated in detail below. 


Key ASCII Unpacked BCD Packed BCD 
4 34 00000100 
i Sy 00000111 01000111 or 47H 


ORG 0010H 


VAL_ASC DB Ay 

VAL BCD DB ie 

;reminder: DB will june Sa sia OMO iltexeyevietoye, eine Sm ia OOL 
MOV AX,WORD PTR VAL ASC ; AH=37, AL=34 
AND AX, OFOFH ;mask 3 to get unpacked BCD 
XCHG AH,AL 7Swap AH and AL. 
MOV CL 4! ;CL=04 to shift 4 times 
SHL AH, CL ;shift left AH to get AH=40H 
OR AL, AH ;OR them to get packed BCD 
MOV VAL BCD,AL SEVE the Meeu ce 


After this conversion, the packed BCD numbers are processed and the result will 
be in packed BCD format. As will be seen later in this section, there are special instruc- 
tions, such as DAA and DAS, which require that the data be in packed BCD form and give 
the result in packed BCD. For the result to be displayed on the monitor or be printed by 
the printer, it must be in ASCII format. Conversion from packed BCD to ASCII is dis- 
cussed next. 


Packed BCD to ASCII conversion 


To convert packed BCD to ASCII, it must first be converted to unpacked and then 
the unpacked BCD is tagged with 011 0000 (30H). The following shows the process of 
converting from packed BCD to ASCII. 


Packed BCD Unpacked BCD ASCII 
Zon 02H & O9H 22H & 39H 
0010 1001 0000 0010 & 0000 1001 O11, 00 LG S001 AOL 


VAL1_BCD DB 29H 
VAL3-ASC DWC? 


MOV  AL,VAL1 BCD 


MOV AH, AL ;copy AL to AH. now AH=29,AL=29H 
AND AX, OFOOFH mask 9 from AH and 2 from AL 

MOV CL,4 7,CL=04 for shift 

SHR AH, CL rshift right AH to get unpacked BCD 
OR AX, 3030H rcombine with 30 to get ASCII 

XCHG AH,AL ;Swap for ASCII storage convention 


MOV VAL3 ASC,AX ;store the ASCII 


BCD addition and subtraction 


After learning how to convert ASCII to BCD, the application of BCD numbers is 
the next step. There are two instructions that deal specifically with BCD numbers: DAA 
and DAS. Each is discussed separately. 


BCD addition and correction 


There is a problem with adding BCD numbers, which must be corrected. The 


———— ees 
114 


problem is that after adding packed BCD numbers, the result is no longer BCD. Look at 
this example: 


MON SAL, 27H 
ADD AL, 28H 


Adding them gives 0011 1111B (3FH), which is not BCD! A BCD number can 
only have digits from 0000 to 1001 (or 0 to 9). In other words, adding two BCD numbers 
must give a BCD result. The result above should have been 17 + 28 = 45 (0100 0101). To 
correct this problem, the programmer must add 6 (0110) to the low digit: 3F + 06 = 45H. 
The same problem could have happened in the upper digit (for example, in 52H + 87H = 
D9H). Again to solve this problem, 6 must be added to the upper digit (D9H + 60H = 
139H), to ensure that the result is BCD (52 + 87 = 139). This problem is so pervasive that 
the vast majority of microprocessors have an instruction to deal with it. 


DAA 


The DAA (decimal adjust for addition) instruction in x86 microprocessors is pro- 
vided exactly for the purpose of correcting the problem associated with BCD addition. 
DAA will add 6 to the lower nibble or higher nibble if needed; otherwise, it will leave the 
result alone. The following example will clarify these points: 


DATA] DB 47H 
DATA2 DEBER ZGH 
DATA3 DB? 


MOV AL, DATA1 ;AL holds first BCD operand 

MOV BL, DATA2 ;BL holds second BCD operand 

ADD AL,BL *;BCD addition 

DAA adjust for BCD addition 

MOV DATA3,AL ;store result in correct BCD form 


After the program is executed, the DATA3 field will contain 72H (47 + 25 = 72). 
Note that DAA works only on AL. In other words, while the source can be an operand of 
any addressing mode, the destination must be AL in order for DAA to work. It needs to 
be emphasized that DAA must be used after the addition of BCD operands and that BCD 
operands can never have any digit greater than 9. In other words, no A-F digit is allowed. 
It is also important to note that DAA works only after an ADD instruction; it will not work 
after the INC instruction. 


Summary of DAA action 


1. Ifafter an ADD or ADC instruction the lower nibble (4 bits) is greater than 9, or if AF 
= 1, add 0110 to the lower 4 bits. 
2. Ifthe upper nibble is greater than 9, or if CF = 1, add 0110 to the upper nibble. 
In reality there is no other use for the AF (auxiliary flag) except for BCD addition 
and correction. For example, adding 29H and 18H will result in 41H, which is incorrect 
as far as BCD is concerned. See the following code: 


Hex BCD 
29 OOO OOA 
+ 18 + 0001 1000 
41 0100 OO01 Because AF = 1, 
+ 6 + OLO DAA adds 6 to lower nibble. 
47 0100 0111 The final result is BCD. 


Program 3-6 demonstrates the use of DAA after addition of multibyte packed 
BCD numbers. 


m 
CHAPTER 3: ARITHMETIC AND LOGIC INSTRUCTIONS AND PROGRAMS 115 


Two sets of ASCII data have come in from the keyboard. Write and run a program to: 
1. Convert from ASCII to packed BCD. 

2. Add the multibyte packed BCD and save it. 

3. Convert the packed BCD result to ASCII. 


IPE aE ots PROG3-6 (EXE) ASCII TO BCD CONVERSION AND ADDITION 
PAGE C07 32 
-MODE SMALL 


. STACK 64 


DATA1 ASC `0649147816' 
p ORG 0010H 
DATA2 ASC DB *0072687188' 
g ORG 0020H 
DATA3 BCD DB 5 DUP (?) 
p ORG 0028H ` 
DATA4 BCD DB 5 DUP (?) 


DATAS ADD 


DATA6 ASC 


MAIN PROC 


MOV AX, @DATA 
MOV DS, AX 
MOV BX,OFFSET DATA] ASC 7BX points {6 first Ace iiedaca 
MOY DI,OFESET DATA3 BCD ;DI points to first BCDedara 
MOV (ec LO 7;CX holds number bytes to convert 
CALL CONV BCD MeEomMsia INCI To BCD 
MOV BX,OFFSET DATA2 ASC ;BX points to second ASCII data 
MOV DI,OFFSET DATA4 BCD ;DI points to second BCD data 
MOV (OK, Ie 7;CX holds number bytes to convert 
CALL CONV_BCD TECE INCI Ee BCD 
CALL BCD ADD ;add the BCD operands 
MOV SI,OFFSET DATA5_ ADD fol points co BCD regvile 
MOV DI, OFFSET DATA6 ASC ;DI points to ASCII result 
MOV Cx OS CX holds count for convert 
CALL CONV_ASC ;COMMELL Tesultl eS ASCI 
MOV AH, 4CH 
INT 2NA 90 BaCk tee) OS 
MAIN ENDP 


;THIS SUBROUTINE CONVERTS ASCII TO PACKED BCD 
CONV_BCD PROC 


AGAIN: MOV AX,[ BX] ,BX=pointer tor ASCII data 
XCHG AH,AL 
AND AX, OFOFH ¿mask ASCII 3s 
PUSHERCX ;save the counter 
MOV CL, 4 p almenas AH left 4 bits 
SHL AH, CL ;to get ready for packing 
OR AL, AH ;combine to make packed BCD 
MOV [DT], AL ;DI=pointer for BCD data 
ADD BATZ Haere TO Mieke 2 ASCI bytes 
INC DI ;point to next BCD data 
POP CX r reStoremrbopHeouUnNnter 
LOOP AGAIN 
RET 


CONV _BCD ENDP 


Program 3-6 (continued on the following page) 


aaae 
116 


;THIS SUBROUTINE ADDS TWO MULTIBYTE PACKED BCD OPERANDS 
BCD ADD PROC 


MOV BX,OFFSET DATA3 BCD *;BX=pointer for operand 1 
MOV DI,OFFSET DATA4 BCD ;DI=pointer for operand 2 
MOV SI,OFFSET DATAS ADD 7 ol —pemnitem for Sum 
MOV = exe OS 
Cie 
BACK: MOV AL,[ BX] +4 -get next byte of operand 1 
ADC AL,[ DI] +4 ;add next byte of operand 2 
DAA Correct tor BCDPaddi tion 
MOV [SI] +4,AL ;save sum 
DEC BX ;point to next byte of operand 1 
DEC DI ;point to next byte of operand 2 
DEC Si ;point to next byte of sum 
LOOP BACK 
RET 


BCD_ADD ENDP 


;THIS SUBROUTINE CONVERTS FROM PACKED BCD TO ASCII 
CONV_ASC PROC 


AGAIN2: MOV AL,[ SI] ;SIl=pointer for BCD data 
MOV AH, AL ;duplicate to unpack 
AND AX, OFOOFH junpack 
PUSH CX ; save counter 
MOV CL, 04 ;shift right 4 bits to unpack 
SHR AH, CL ;the upper nibble 
OR AX, 3030H ;make it ASCII 
XCHG TAH ALE ;swap for ASCII storage convention 
MOV [ DI] , AX ;store ASCII data 
INC Sal 7pOint to next BCD data 
ADD DI aA "point ito mezt ASCII data 
BOE EX ¿restore loop counter 
LOOP AGAIN2 
RET 
CONV_ASC ENDP 


END MAIN 


Program 3-6 (continued from the preceding page) 


The following shows that 6 is added to the upper nibble due to the fact it is greater 


than 9: 
Hex BCD 
5S 0010 0011 
+ 75 + Ome O10) 
D8 1 aa 1000 Because the upper nibble is greater than 9, 
+ 6 + 0110 DAA adds 6 to upper nibble. 
Ze 0010 1000 The final result is BCD. 


See Appendix B for more examples of DAA. 


BCD subtraction and correction 


The problem associated with the addition of packed BCD numbers also shows up 
in subtraction. Again, there is an instruction (DAS) specifically designed to solve the 
problem. Therefore, when subtracting packed BCD (single-byte or multibyte) operands, 
the DAS instruction is put after the SUB or SBB instruction. AL must be used as the des- 
tination register to make DAS work. 


SE aaa 
CHAPTER 3: ARITHMETIC AND LOGIC INSTRUCTIONS AND PROGRAMS 117 


Summary of DAS action 


1 If.after a SUB or SBB instruction the lower nibble is greater than 9, or if AF = 1, sub- 
tract 0110 from the lower 4 bits. 
2. Ifthe upper nibble is greater than 9, or CF = 1, subtract 0110 from the upper nibble. 
Due to the widespread use of BCD numbers, a specific data directive, DT, has 
been created. DT can be used to represent BCD numbers from 0 to 1020 — 1 (that is, twen- 
ty 9s). Assume that the following operands represent the budget, the expenses, and the 
balance, which is the budget minus the expenses. 


BUDGET DT 87965141012 

EXPENSES D 31610640392 

BALANCE DT 2 balance = budget - expenses 
MOV Ex, 10 ; counter=10 
MOV Bx, 00 ;pointer=0 5 
CTE ;clear carry for the Ist iteration 


BACK: MOVAL, BYTE PTR BUDGET[ BX] ;get a byte of the BUDGET 
SBB AL; BYTE PTR EXPENSES BX] subtract sam Oil S aes Omen 


DAS correct the result for, BCD 
MOV BYTE PTR BALANCE[ BX] ,AL ;save it in BALANCE 
INC BX ;increment for the next byte 
LOOP BACK ¡continue until CXC 


Notice in the code section above that (1) no H (hex) indicator is needed for BCD 
numbers when using the DT directive, and (2) the use of the based relative addressing 
mode (BX + displacement) allows access to all three arrays with a single register BX. 


Review Questions 


1. For the following decimal numbers, give the packed BCD and unpacked BCD repre- 
sentations. 
(a) 15 (b) 99 
2. True or false. The DAA instruction must be used after the ADD instruction. 
3. True or false. The DAS instruction must be used after the SUB instruction. 
4. Find the value of AL after the following code is executed. 
MOV AL,29H 
ADD AL,18H 
DAA 


SECTION 3.5: ROTATE INSTRUCTIONS 


In many applications there is a need to perform a bitwise rotation of an operand. 
The rotation instructions ROR, ROL and RCR, RCL are designed specifically for that pur- 
pose. They allow a program to rotate an operand right or left. In this section we explore 
the rotate instructions, which frequently have highly specialized applications. In rotate 
instructions, the operand can be in a register or memory. If the number of times an operand 
is to be rotated is more than 1, this is indicated by CL. This is similar to the shift instruc- 
tions. There are two types of rotations. One is a simple rotation of the bits of the operand, 
and the other is a rotation through the carry. Each is explained below. 


Rotating the bits of an operand right and left 


ROR rotate right 


In rotate right, as bits are shifted from left to right they exit from the right end 
(LSB) and enter the left end (MSB). In addition, as each bit exits the LSB, a copy of it is 
given to the carry flag. In other words, in ROR the LSB is moved to the MSB and is also 


aaae 
118 


copied to CF, as shown in the diagram. If 


the operand is to be rotated once, the 1 is 
coded, but if it is to be rotated more than 
once, register CL is used to hold the num- 
ber of times it is to be rotated. 


;AL=0011 0110 
;AL=0001 1011 
;AL=1000 1101 
;AL=1100 0110 


;AL=0011 0110 


CF=0 
CF=1 
CF=1 


;CL=3 number of times to rotate 


;AL=1100 0110 CF=1 


SBx— 00) OL TO! ONO 
7;CL=6 number of times to rotate 


MOV AL, 36H 
ROR AL, 1 
ROR AL, 1 
ROR AL, 1 
Or: 
MOV AL, 36H 
MOV GIT 
ROR AT, CL 
;the operand can be a word: 
MOV Bx, 0C7HSH 
MOV CO 
ROR BACE 


ROL rotate left 


In rotate left, as bits are shifted from right to left they exit the left end (MSB) and 
enter the right end (LSB). In addition, 
every bit that leaves the MSB is copied to 
the carry flag. In other words, in ROL the 
MSB is moved to the LSB and is also 
copied to CF, as shown in the diagram. If 
the operand is to be rotated once, the 1 is coded. Otherwise, the number of times it is to 
be rotated is in CL. 


MOV 
ROL 
ROL 
ROL 
ROL 
OÑ: 
MOV 
MOV 
ROL 


BEP 2 
BH, 1 
BH, 1 
BH, 1 
BH, 1 


Bee. 2h 
CL, 4 
BH, CL 


;BX=1001 0111 0001 1111 CF=1 


;BH=0111 0010 
;BH=1110 0100 
;BH=1100 1001 
;BH=1001 0011 
;BH=0010 0111 


7BH=0111 0010 


MSB æ] SB 


CF=0 
Cia 
Cr=Il 
CF=1 


;CL=4 number of times to rotate 


7;BH=0010 O111 


;The operand can be a word: 
0X0 MO Osa ON OS OdO 
;CL=3 number of times to rotate 


MOV 
MOV 
ROL 


DX, 672AH 


Cip S 


DX, CL FDxX-0011 1001 


Clg 


OURO 01 1 aera: 


oF MSB =z] SB eee 


Program 3-7 shows an application of the rotation instruction. The maximum count 
in Program 3-7 will be 8 since the program is counting the number of 1s in a byte of data. 
If the operand is a 16-bit word, the number of 1s can go as high as 16. Program 3-8 is 
Program 3-7, rewritten for a word-sized operand. It also provides the count in BCD for- 
mat instead of hex. Reminder: AL is used to make a BCD counter because the DAA 


instruction works only on AL. 


RCR rotate right through carry 


In RCR, as bits are shifted from 
left to right, they exit the right end (LSB) to 
the carry flag, and the carry flag enters the 


left end (MSB). In other words, in RCR the i. 
LSB is moved to CF and CF is moved to the MSB. In reality, CF acts as if it is part of 
the operand. This is shown in the diagram. If the operand is to be rotated once, the 1 is 


E MSB ——=—»LSB CF l 


CHAPTER 3: ARITHMETIC AND LOGIC INSTRUCTIONS AND PROGRAMS 


119 


Write a program that finds the number of 1s in a byte. 


;From the data segment: 
DATA1 DB 97H 
COUNT DB È 
-Erom the code segment: 
SUB BL, BL ;clear BL to keep the number of 1s 
MOV DETS ;rotate total of 8 times 
MOV AL, DATA1 
ROL AL,1 ¿rotate it once 
JNC NEXT scheck itoie i 
INC BL ;if CF=1 then add one to count 
DEC DL ;go through this 8 times 
JNZ AGAIN ;if not finished go back 
MOV COUNT, BL ¿save the number of 1s 


Program 3-7 


Write a program to count the number of 1s in a word. Provide the count in BCD. 


DATAW1 97F4H 
COUNT2 ? 


AL, AL ;clear AL to keep the number of 1s in BCD 
IDL ALS ;rotate total of 16 times 
BX, DATAW1 ;move the operand to BX 


BX ;rotate it once 

NEXT ;check for 1. If CF=0 then jump 

AL, 1 ;if CF=1 then add one to count 
yadjust the count for BCD 

DL 7go through this 16 times 

AGAIN ;if not finished go back 

COUNT2, Al ;save the number of 1s in COUNT2 


Program 3-8 


coded, but if it is to be rotated more than once, the register CL holds the number of times. 


CLC ;make CF=0 
MOV AL, 26H ;AL=0010 0110 
RCR AL, 1 7;AL=0001 0011 CF=0 
RCR AL, 1 7;AL=0000 1001 CF=1 
RCR AL, 1 ;AL=1000 0100 CF=1 
Oise 
CLC ¿make CF=0 
MOV AL, 26H ;AL=0010 0110 
MOV Chl, 3 7;CL=3 number of times to rotate 
RCR PAN (CIE 7;AL=1000 0100 CF=1 


;the operand can be a word 


STG ;make CF=1 

MOV BX Sh RTA 7;BX=0011 0111 1111 0001 

MOV CL 9 pECt=S5 wibinlesie CHE Tues CO Orate 
RCR BA Gh 7BX=0001 1001 1011 1111 CF=0 


RCL rotate left through carry 


In RCL, as bits are shifted from right to left they exit the left end (MSB) and enter 
the carry flag, and the carry flag enters the right end (LSB). In other words, in RCL the 
MSB is moved to CF and CF is moved to the LSB. In reality, CF acts as if it is part of the 


eee 
120 


operand. This is shown in the diagram. If 
the operand is to be rotated once, the 1 is 
coded, but if it is to be rotated more than 
once, register CL holds the number of 


times. 
STE ¡make CF=1 
MOV Jeullny, JL Suet ;BL=0001 0101 
RCL Big 70010 1011 CF=0 
RCL Bila, Il POLOL OLO Wis 0) 
on: 
STG ;make CF=1 
MOV BG ISH ;BL=0001 O101 
MOV CEZ ;CL=2 number of times for rotation 
RCL BIOL ;BL=0101 0110 CF=0 


;the operand can be a word: 


ene ;make CF=0 

MOV AX, 191CH 7;AX=0001 1001 0001 1100 

MOV CURO ;CL=5 number of times to rotate 
RCL AX, CL ;AX=0010 0011 1000 0001 CF=1 


Review Questions 


1. What is the value of BL after the following? 
MOV BL,25H 
MOV CL, 4 
ROR BITON 
2. What are the values of DX and CF after the following? 
MOV DX,3FA2H 
MOV Clip 7 
ROL DxX,CL 
3. What is the value of BH after the following? 
SUB BH,BH 


SIUC 
RCR BH 
SUN 
RCR BEG 


4. What is the value of BX after the following? 
MOV BX, PF EFH 
MOV CE9 
CLC 
ROG BX, CL 
5. Why does "ROR BX, 4" give an error in the 8086? How would you change the code 
to make it work? 


SECTION 3.6: BITWISE OPERATORS IN THE C LANGUAGE 


One of the most important and powerful features of the C language is its ability 
to perform bit manipulation. Due to the fact that many books on C do not cover this impor- 
tant topic, it is appropriate to discuss it in this section. This section describes the action of 
operators and provides examples. 


Bitwise operators in C 


While every C programmer is familiar with the logical operators AND (&&), OR 
(I), and NOT (!), many C programmers are less familiar with the bitwise operators AND 
(&), OR (|), EX-OR (^, inverter (~), Shift Right (>>), and Shift Left (<<). These bitwise 
operators are widely used in software engineering and control; consequently, their under- 
standing and mastery are critical in system design and interfacing. See Table 3-4. 


eee eee reer ener  ———————— 


CHAPTER 3: ARITHMETIC AND LOGIC INSTRUCTIONS AND PROGRAMS 121 


AND OR EX-OR Inverter 


1 
0 


Sq) o"'1|= 
eff | © 
Olj—[—|oO 


The following code shows Examples 3-5 through 3-7 using the C logical opera- 
tors. Recall that "0x" in the C language indicates that the data is in hex format. 


0x35 & OxOF = 0x05 /* ANDing: see Example 3-5 */ 
0x0504 | OxDA68 = OxDF6C /* ORing: see Example 3-6 */ 
zene = OTE = 0x 2C /* XORing: see Example 3-7 */ 
~0x37 = 0xC8 /* “nverting SV y 


The last one is like the NOT instruction in x86 microprocessors: 


MOV AL,37H ; AL=37H 
NOT AL ;AFTER INVERTING 37, AL=C8H 


Bitwise shift operators in C 


There are two bitwise shift operators in C: Shift Right ( >>) and Shift Left (<<). 
They perform exactly the same operation as SHR and SHL in Assembly language, as dis- 
cussed in Section 3.3. Their format in C is as follows: 


data >> number of bits to be shifted/* shifting right */ 
data << number of bits to be shifted/* shifting left */ 
The following example shows the use of shift operators in C: 


Ox9A >> 3= 0x13 //shifting right 3 times: see Example 3-9 
Ox7777 >> 4 = 0x0777 //shifting right 4 times: see Example 3-10 
0x6 << 4A = 0x60 // shifting left 4 times: see Example 3-11 


Program 3-9 demonstrates the syntax of bitwise operators in C. Next we show 
some real-world examples of their usage. 


Packed BCD to ASCII conversion in C 


Section 3.4 showed one way to convert a BCD number to ASCII. This conversion 
is widely used when dealing with a real-time clock chip. Many of the real-time clock chips 
provide very accurate time and date for up to ten years without the need for external 
power. There is a real-time clock in every x86 IBM PC or compatible computer. However, 
these chips provide the time and date in packed BCD. In order to display the data, it needs 
to be converted to ASCII. Program 3-10 is a C version of the packed BCD-to-ASCII con- 
version example discussed in Section 3.4. Program 3-10 converts a byte of packed BCD 
data into two ASCII characters and displays them using the C bitwise operators. 

Notice in Program 3-10 that if the packed BCD data is displayed without conver- 
sion to ASCII, we get the parenthesis ")". See Appendix F. 


122 


/* Program 3-9 Repeats Examples 3-5 DEON P= sbi eau, 

#include <stdio.h> 

main () 

{ 
// Notice the way data is defined in C for hex format using 0x 

Uns Qned ‘chamgdeta,| = 0x35; 

unsigned int data 2 = 0x504; 

unsigned int data 3 = 0xDA66; 

unsigned char data 4= 0x54; 

unsigned char data_5=0x78; 

unsigned char data_6=0x37; 

unsigned char data_7=0x09A; 

unsigned char temp; 

unsigned int temp 2; 


temp=data_1&0x0F; //ANDing 
printf ("\nMasking the upper four bits of %X (hex) we get 
(hex)\n",data_1,temp) ; 


oO 
x 


temp _2=data_2|data_3; //ORing 
printf("The result of %X hex ORed with %X hex is bX 
Mex\n",data 2,data_3,temp 2); 


temp= data 4“data 5; //EX-ORing 
printf("The results of %X hex EX-ORed with %X hex is %X 
hex\n",data_4,data_5,temp) ; 


temp=~data_6; // INVERTING 
printf (rhe result of %X hex inverted is %X hex\n",data_ 6, temp) ; 


temp=data_7>>3; //SHIFTING Right 
printf("When %X hex is shifted right three times we get %X 
hex\n",data_7,temp) ; 


printf("When %X hex is shifted right four times we get %X 
Insc ag OS 7) FFT) Oba TP Tee Oe 


temp= (0x6<<4) ; //SHIFTING Left 
printf ("When %X hex is shifted left td times we get sX 
hex\n",0x6,4, temp) ; 

} 


Program 3-9 


* Program 3-10 shows packed BCD-to-ASCII conversion using logical 


bitwise operators in C */ 
#include <stdio.h> 
main () 


{ 


unsigned char mybcd=0x29; /* declare a BCD number in hex */ 


mask the upper four bits */ 
asea, l=asci 1|0x30; /* make it an ASCII character */ 
asci 2=mybcd&0xf0; /* mask the lower four bits */ 
asel Pace: 2>>4; 7* Shift it righe times */ 

asci 2=asci 2/030; /* make it an ASCII character */ 
printf£("BCD data Pee 1s ome it) AscciIi\n"pmybedpasci 1,asci_ 2); 
printf("My BCD data is ‘%c Ie MOE Cowra! to ASCOTAN mybecd) r; 


} 


Program 3-10 
a 


CHAPTER 3: ARITHMETIC AND LOGIC INSTRUCTIONS AND PROGRAMS 123 


Testing bits in C 


In many cases of system programming and hardware interfacing, it is necessary 
to test a given bit to see if it is high. For example, many devices send a high signal to state 
that they are ready for an action or to indicate that they have data. How can the bit (or bits) 
be tested? In such cases, the unused bits are masked and then the remaining data is test- 
ed. Program 3-11 asks the user for a byte and tests to see whether or not DO of that byte 
is high. 


/* Program 3-11 "shows how to test) bit DO to BES a 
#include <s010.> 

main () 

{ 
unsigned char status; 

unsigned char temp; 

printf ("\nType in a Hex value\n"); 

Scant (TXU eStats): //get the data 
temp=status&0x01; //mask all bits except DO 
if (temp==0x01) fj? tse high? 

Primer (DC sre hagh™ A //if yes, say so 

else printf("DO is low"); PASE WO, SEN ne 


} 


Program 3-11 


The Assembly language version of Program 3-11 is as follows: 


;assume AL=value (in hex) 


AND AL,01 ;MASK ALL BITS EXCEPT DO 
CMP AL,O1 7IS DO HIGH 
JNE BELOW ;MAKE A DECISION 
po oc 7YES DO IS HIGH 
BECOWN: 6 6 56 PDO IS LOW 


Review Questions 


What is the result of 0x2F&0x27 ? 

What is the result of Ox2F|0x27 ? 

What is the result of 0x2F“0x27 ? 

What is the result of ~0x2F ? 

What is the result of 0x2F >>>> 3 ? 

What is the result of 0x27 <<<< 4? 

In Program 3-10 if mybed = 0x32, what is displayed if it is not converted to BCD? 
Modify Program 3-11 to test D3. 


PROBLEMS 


oo Oe 


SECTION 3.1: UNSIGNED ADDITION AND SUBTRACTION 


1. Find CF, ZF, and AF for each of the following. Also indicate the result of the addi- 
tion and where the result is saved. 


(a) MOV BH,3FH (b) MOV DX,4599H (c) MOV AX,255 
ADD BH,45H_ MOV CX,3458H Sic 
ADD CXD | ADC AX,00.) 

(d) MOV BX,0FF01H (ey MOV CX,0FFFFH (f) MOV AH,OFEH 
ADD BL,BH STC STC 


ADC CX,00 TE ADC C S 


124 


2. Write, run, and analyze a program that calculates the total sum paid to a salesperson 
for eight months. The following are the monthly paychecks for those months: $2300, 
$4300, $1200, $3700, $1298, $4323, $5673, $986. 

3. Rewrite Program 3-2 (in Section 3.1) using byte addition. 

4. Write a program that subtracts two multibytes and saves the result. Subtraction 
should be done a byte at a time. Use the data in Program 3-2. 

5. State the three steps involved in a SUB and show the steps for the following data. 
(a) 23H —- 12H (b) 43H- 51H (c) 99-99 


SECTION 3.2: UNSIGNED MULTIPLICATION AND DIVISION 


6. Write, run, and analyze the result of a program that performs the following: 


(1) (a) bytel x byte2 (b) bytel x word1 (c) word! x word2 

(2) (a) bytel / byte2 (b) word! / word2 (c) doubleword / bytel 

Assume bytel = 230, byte2 = 100, word! = 9998, word2 = 300, and doubleword = 
100000. 


SECTION 3.3: LOGIC INSTRUCTIONS 


7. Assume that the following registers contain these hex contents: AX = F000, BX = 
3456, and DX = E390. Perform the following operations. Indicate the result and the 
register where it is stored. Give also ZF and CF in each case. 

Note: the operations are independent of each other. 


(a) AND DX,AX (b) OR DH,BL 

(c) XOR AL,76H (d) AND DX,DX 

(e) XOR AX,AX (f) OR BX,DX 

(g) AND AH,OFF (h) OR AX,9999H 

(i) XOR DX,OQEEEEH (j) XOR BX,BX 

(k) MOV CL,04 (1) SHR DX, 1 
SHL AL,CL 

(m) MOV CL,3 (n) MOV CL,5 
SHR DL,CL SHL BX,CL 

(0) MOV CL,6 
SHL DX,CL 

8. Indicate the status of ZF and CF after CMP is executed in each of the following cases. 

(a) MOV BX,2500 (b) MOV AL,OFFH (c) MOV DL,34 
CMP BX, 1400 CMP AL,6FH CMP DL,88 

(d) SUB AX,AX (e) XOR DX,DX (f) SUB CX,CX 
CMP AX,0000 CMP DX,0FFFFH DEC CX 
CMP CX,0FFFFH 

(g) MOV BX,2378H (h) MOV AL,OAAH 
MOV DX,4000H AND AL,55H 
CMP DX,BX CMP AL,00 

9. Indicate whether or not the jump happens in each case. 

(a) MOV CL,5 (b) MOV BH,65H (c) MOV AH,55H 
SUB AL,AL MOV AL,48H SUB DL,DL 
SHL AL,CL OR AL,BH OR DL,AH 
JNC TARGET SHL AL,1 MOV CL,AH 


JC TARGET }————_ AND CL,0FH_}—____—_> 


10. Rewrite Program 3-3 to find the lowest grade in that class. 

11. Rewrite Program 3-4 to convert all uppercase letters to lowercase. 

12. In the IBM BIOS program for testing flags and registers, verify every jump (condi- 
tional and unconditional) address calculation. Reminder: As mentioned in Chapter 2, 
in forward jumps the target address is calculated by adding the displacement value to 


deen ener reer eee _ _— 
CHAPTER 3: ARITHMETIC AND LOGIC INSTRUCTIONS AND PROGRAMS 125 


IP of the instruction after the jump and by subtracting in backward jumps. 


SECTION 3.4: BCD AND ASCH CONVERSION 


Hee 


20. 


. In Program 3-6 rewrite BCD_ADD to do subtraction of the multibyte BCD. 

. Using the DT directive, write a program to subtract two 10-byte BCD numbers. 

. Using the DT directive, write a program to add two 10-byte BCD numbers. 

. We would like to make a counter that counts up from 0 to 99 in BCD. What instruc- 


tion would you place in the dotted area? 
SUB AL,AL 
ADD AL,]1 


. Write Problem 16 to count down (from 99 to 0). 
. An instructor has the following grading policy: "Curving of grades is achieved by 


adding to every grade the difference between 99 and the highest grade in the class." 
If the following are the grades of the class, write a program to calculate the gradés 
after they have been curved: 81, 65, 77, 82, 73, 55, 88, 78, 51, 91, 86, 76. Your pro- 
gram should work for any set of grades. 
If we try to divide 1,000,000 by 2: a 
(a) What kind of problem is associated with this operation in the 8086 CPU? 
(b) How does the CPU let us know that there is a problem? 
Which of the following groups of code perform the same operation as LOOP XXX? 
(a) DEC CL (b) DEC CH (c)DECBX (d) DEC CX 

JNZ XXX JNZ XXX JINZ JNZ XXX 


SECTION 3.5: ROTATE INSTRUCTIONS 


2 
22. 


Write a program that finds the number of zeros in a 16-bit word. 
Explain the difference between RCL and ROL instructions. 


SECTION 3.6: BITWISE OPERATORS IN THE C LANGUAGE 


2s 


24. 


25 


26. 


a 


Write a C program with the following components: 
(a) have two hex values: datal = 55H and data2 = AAH, both defined as unsigned 
char, 
(b) mask the upper 4 bits of datal and display it in hex, 
(c) perform AND, OR, and EX-OR operations between the two data items and then 
display each result, 
(d) invert one and display it, 
(e) shift left datal four times and shift right data2 two times, then 
display each result. 
Repeat the above problem with two values input from the user. Use the scanf("%X") 
function to get the hex data. 
In the same way that the real-time clock chip provides data in BCD, it also expects 
data in BCD when it is being initialized. However, data coming from the keyboard is 
in ASCII. Write a C program to convert two ASCII bytes of data to packed BCD. 
Write a C program in which the user is prompted for a hex value. Then the data is test- 
ed to see if the two least significant bits are high. If so, a message states "DO and D1 
are both high"; otherwise, it states which bit is not high. 
Repeat the above problem for bits DO and D7. 


126 


ANSWERS TO REVIEW QUESTIONS 


SECTION 3.1: UNSIGNED ADDITION AND SUBTRACTION 


Destination 
In x86 Assembly language, there are no memory-to-memory operations. 
MOV AX,DATA 2 
ADD DATA 1,AX 
Destination, source + destination + CF 
In (a), the byte addition results in a carry to CF; in (b), the word addition results in a 
carry to the high byte BH. 
DEC CK 
JNZ ADD LOOP 
43H 01000011 01000011 
—05H 0000 0101 2's complement = +1111 1011 
3EH 0011 1110 
CF = 0; therefore, the result is positive 
8. AL=95-4F-1=45 


SD ee 


SECTION 3.2: UNSIGNED MULTIPLICATION AND DIVISION 


l. AX 2. DX and AX 3. AX 4. AL, AH 
5. AX,DX 6. AL, AH 7, AX, DX 


SECTION 3.3: LOGIC INSTRUCTIONS AND SAMPLE PROGRAMS 


(a) 4202 (b) CFFF (c) 83DFD 

The operand will remain unchanged; all zeros 
All ones 

All zeros 

AOF2 = 1010 0000 1111 0010 

shift left: 0100 0001 1110 0100 CF=1 
shift again: 1000 0011 1100 1000 CF=0 
shift again: 0000 0111 1001 0000 CF =1 
AOF2 shifted left three times = 0790 

AOF2 = 1010 0000 1111 0010 

shift right: 0101 0000 0111 1001 CF=0 
shift again: 0010 1000 0011 1100 CF =1 
shift again: 0001 0100 0001 1110 CF =0 
AOF2 shifted right three times = 141E 

6. SUB 

7. False 


Esl see ead ae 


SECTION 3.4: BCD AND ASCII CONVERSION 


1. (a) 15 =0001 0101 packed BCD = 0000 0001 0000 0101 unpacked BCD 
(b) 99 = 1001 1001 packed BCD = 0000 1001 0000 1001 unpacked BCD 

2. True 

3. True 

4. AL=47H 


SECTION 3.5: ROTATE INSTRUCTIONS 


BL = 52H, CF = 0 

DX = DEH, CF = 1 

BH = COH 

BX = FFEFH 

The source operand cannot be immediate; to fix it: 
MOV CL,4 

ROR BX,CL 


DFR al 


a 


CHAPTER 3: ARITHMETIC AND LOGIC INSTRUCTIONS AND PROGRAMS 127 


SECTION 3.6: BITWISE OPERATION IN THE C LANGUAGE 


0x27 
Ox2F 
0x08 
0xD0 
0x05 
0x70 
2 
/* This program shows how to test Bit D3 to see if it is high 
a 
#include <<stavo.he> 
main () 
{ 
unsigned char status; 
unsigned char temp; 
printf ("\nType in a Hex value\n"); 
scant (Yeo! Status); 
temp=status&0x08; 
if (temp==0x08) 
De arts aiel high ).+ 
GIlSe jorealionese (VIDS as owg): 


} 


SS ae a pT 


128 


CHAPTER 4 


INT 21H AND INT 10H 


PROGRAMMING AND MACROS 


OBJECTIVES 


Upon completion of this chapter, you will be able to: 


>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 


Use INT 10H function calls to: 
Clear the screen 
Set the cursor position 
Write characters to the screen in text mode 
Draw lines on the screen in graphics mode 
Change the video mode 
Use INT 21H function calls to: 
Input characters from the keyboard 
Output characters to the screen 
Input or output strings 
Use the LABEL directive to set up structured data items 
Code Assembly language instructions to define and invoke macros 
Explain how macros are expanded by the assembler 
Use the LOCAL directive to define local variables within macros 
Use the INCLUDE directive to retrieve macros from other files 


129 


There are some extremely useful subroutines within BIOS and the OS that are 
available to the user through the INT (interrupt) instruction. In this chapter, some of them 
are studied to see how they are used in the context of applications. First, a few words 
about the interrupt itself. The INT instruction is somewhat like a FAR call. When it is 
invoked, it saves CS:IP and the flags on the stack and goes to the subroutine associated 
with that interrupt. The INT instruction has the following format: 


INT xx;the interrupt number xx can be 00 - FFH 


Since interrupts are numbered 00 to FF, this gives a total of 256 interrupts in x86 
microprocessors. Of these 256 interrupts, two of them are the most widely used: INT 10H 
and INT 21H. Each one can perform many functions. INT 10H is examined in Section 
4.1, while INT 21H is covered in Section 4.2. Section 4.3 discusses the concept of the 
macro and how it is used. Various functions of INT 21H and INT 10H are selected by the 
value put into the AH register, as shown in Appendices D and E. Interrupt instructions are 
discussed in detail in Appendix B. 


SECTION 4.1: BIOS INT 10H PROGRAMMING 


INT 10H subroutines are burned into the ROM BIOS of the x86-based IBM PC 
and compatibles and are used to communicate with the computer's screen video. The 
manipulation of screen text or graphics can be done through INT 10H. There are many 
functions associated with INT 10H. Among them are changing the color of characters or 
the background color, clearing the screen, and changing the location of the cursor. These 
options are chosen by putting a specific value in register AH. In this section we show how 
to use INT 10H to clear the screen, change the cursor position, change the screen color, 
and draw lines on the screen. 


Decimal 


screen center screen center 
1230 0C,27 


Figure 4-1. Cursor Locations (row, column) 


Monitor screen in text mode 


The monitor screen in the x86 PC is divided into 80 columns and 25 rows in nor- 
mal text mode (see Figure 4-1). In other words, the text screen is 80 characters wide by 
25 characters long. Since both a row and a column number are associated with each loca- 
tion on the screen, one can move the cursor to any location on the screen simply by chang- 
ing the row and column values. The 80 columns are numbered from 0 to 79 and the 25 
rows are numbered 0 to 24. The top left corner has been assigned 00,00 (row = 00, col- 
umn = 00). Therefore, the top right corner will be 00,79 (row = 00, column = 79): 
Similarly, the bottom left corner is 24,00 (row = 24, column = 00) and the bottom right 
corner of the monitor is 24,79 (row = 24, column = 79). Figure 4-1 shows each location 
of the screen in both decimal and hex. 


130 


Clearing the screen using INT 10H function 06H 


It is often desirable to clear the screen before displaying data. To use INT 10H to 
clear the screen, the following registers must contain certain values before INT 10H is 
called: AH = 06, AL = 00, BH = 07, CX = 0000, DH = 24, and DL = 79. The code will 


look like this: 
MOV AH, 06 7;AH=06 to select scroll function 
MOV AL, 00 ;AL=00 the entire page 
MOV BED On ;BH=07 for normal attribute 
MOV CHAO ;CH=00 row value of start point 
MOV CL, 90 C00 column value of start point 
MOV DH, 24 ;DH=24 row value of ending point 
MOV DIG, UY) ;DL=79 column value of ending point 
INT 10H ;invoke the interrupt 


Remember that DEBUG assumes immediate operands to be in hex; therefore, DX 
would be entered as 184F. However, MASM assumes immediate operands to be in deci- 
mal. In that case DH = 24 and DL = 79. 

In the program above, one of many options of INT 10H was chosen by putting 06 
into AH. Option AH = 06, called the scroll function, will cause the screen to scroll 
upward. The CH and CL registers hold the starting row and column, respectively, and DH 
and DL hold the ending row and column. To clear the entire screen, one must use the top 
left cursor position of 00,00 for the start point and the bottom right position of 24,79 for 
the end point. 

Option AH = 06 of INT 10H is in reality the "scroll window up" function; there- 
fore, one could use that to make a window of any size by choosing appropriate values for 
the start and end rows and columns. However, to clear the screen, the top left and bottom 
right values are used for start and stop points in order to scroll up the entire screen. It is 
more efficient coding to clear the screen by combining some of the lines above as follows: 


MOV AX, 0600H ;scroll entire screen 

MOV BATON ¿normal attribute 

MOV CX, 0000 pstart at 00700 

MOV DX, 184FH zend at 24,79 (hex = 18,4F) 
INT 10H jinvoke the interrupt 


INT 10H function 02: setting the cursor to a specific location 


INT 10H function AH = 02 will change the position of the cursor to any location. 
The desired position of the cursor is identified by the row and column values in DX, where 
DH = row and DL = column. Video RAM can have multiple pages of text, but only one 
of them can be viewed at a time. When AH = 02, to set the cursor position, page zero is 
chosen by making BH = 00. 

It must be pointed out that after INT 10H (or INT 21H) has executed, the regis- 
ters that have not been used by the interrupt remain unchanged. In other words, these reg- 
isters have the same values after execution of the interrupt as before the interrupt was 
invoked. Examples 4-1 and 4-2 demonstrate setting the cursor to a specific location. 


Example 4-1 


Write the code to set the cursor position to row = 15 = OFH and column = 25 = 19H. 


Solution: 


MOV ;set cursor option 
MOV ;page 0 

MOV ;column position 

MOV ;row position 

INT ;invoke interrupt 10H 


CHAPTER 4: INT 21H AND INT 10H PROGRAMMING AND MACROS 131 


Example 4-2 


Write a program that (1) clears the screen and (2) sets the cursor at the center of the screen. 


Solution: 

The center of the screen is the point at which the middle row and middle column meet. Row 12 
is at the middle of rows 0 to 24 and column 39 (or 40) is at the middle of columns 0 to 79. By 
setting row = DH = 12 and column = DL = 39, the cursor is set to the screen center. 


;clearing the screen 
MOV AX, 0600H ;scroll the entire page 
MOV BAON ;normal attribute 
MOV CX, 0000 ¿crow and column of top kert 
MOV DX, 184FH ;row and column of bottom right 
INT 10H ;invoke the video BIOS service 


;setting the cursor to the center of screen 
MOV AH, 02 ¿set cursor option 
MOV BH, 00 ;page 0 
MOV DIM SHS, ;center column position 
MOV DH, £2 ;center row position 
INT 10H ;invoke interrupt 10H 


INT 10H function 03: get current cursor position 


In text mode, one is able to determine where the cursor is located at any time by 
executing the following: 


MOV AH, 03  opmiem 09 ORIBTOS INT TOH 
MOV BH, 00 ;page 00 
INT 10H ;interrupt 10H routine 


After execution of the program above, registers DH and DL will have the current 
row and column positions, and CX provides information about the shape of the cursor. 
The reason that page 00 was chosen is that the video memory could contain more than one 
page of data, depending on the video board installed on the PC. In text mode, page 00 is 
chosen for the currently viewed page. 


Changing the video mode 


To change the video mode, one can use INT 10H with AH = 00 and AL = video 
mode. A list of video modes is given in Appendix D, Table D-2. 


Attribute byte in monochrome monitors 


There is an attribute associated with each character on the screen. The attribute 
provides information to the video circuitry, such as color and intensity of the character 
(foreground) and the background. The attribute byte for each character on the mono- 
chrome monitor is limited. Figure 4-2 shows bit definitions of the monochrome attribute 
byte. 

Foreground refers to the actual character displayed. Normal, highlighted intensi- 
ty and blinking are for the foreground only. The following are some possible variations of 
the attributes in Figure 4-2. 


132 


Binary Hex Resmlit 


0000 0000 00 white 
0000 0111 07 white 
COO 111 OF white 
LOOO 0111 87 white 
GIRI OA Med) black 
0111 0000 70 black 
a CaL OOOO) FO black 


on 
on 
on 
on 
on 
on 
on 


white (no display) 
black normal 
black highlight 
black blinking 
black (no display) 
white 

white blinking 


For example, "00000111" would give the normal screen mode where the back- 
ground is black and the foreground is normal intensity, nonblinking. "00001111" would 
give the same mode with the foreground highlighted. "01110000" would give a reverse 
video screen mode with the foreground black and the background normal intensity. See 


Example 4-3. 


foreground intensity 


0 = normal intensity 
1 = highlighted intensity 


background intensity 


0 = nonblinking 
1 = blinking 


Figure 4-2. Attribute Byte for Monochrome Monitors 


Attribute byte in CGA text mode 


Since all color moni- 
tors and their video circuitry 
are upwardly compatible, in 
examples concerning color, in 
this chapter we use CGA 
mode, the common denomina- 
tor for all color monitors. The 
bit definition of the attribute 
byte in CGA text mode is as 
shown in Figure 4-3. From the 
bit definition it can be seen that 
the background can take eight 
different colors by combining 


Figure 4-3. CGA Attribute Byte 


[97 | pe | ns | oe | os | v2 | ot | bo: 
B R G B I R G B 


background 


B = blinking 
I = foreground intensity 


Blinking and intensity apply to foreground only. 


background 


the prime colors red, blue, and green. The foreground can be any of 16 different colors by 
combining red, blue, green, and intensity. Example 4-4 shows the use of the attribute byte 
in CGA mode. Table 4-1 lists the possible colors. As examples of some possible variations 


look at the following cases: 


Binary Hex Color effect 
0000 0000 00 Black on black 
OOOO” 0001 O Blue on black 


Ooo LOED T2 Green on blue 
0001 0100 14 Red on blue 
OCOm TILL ie High-intensity white on blue 


ne eee renee —————— 
CHAPTER 4: INT 21H AND INT 10H PROGRAMMING AND MACROS 133 


Example 4-3 


Write a program using INT 10H to: 

(a) Change the video mode. 

(b) Display the letter "D" in 200H locations with attributes black on white blinking (blinking 
letters "D" are black and the screen background is white). 

(c) Then use DEBUG to run and verify the program. 


Solution: 


(a) INT 10H function AH = 00 is used with AL = video mode to change the video mode. Use 
AL = 03. 


MOV AH, 00 ;oET MODE OPTION 
MOV AL,03 ;CHANGE THE VIDEO MODE 
INT 10H ;MODE OF 80X25 FOR ANY COLOR MONITOR 


(b) With INT 10H function AH = 09, one can display a character a certain number of times with 
specific attributes. 


MOV AH, 09 ¿DISPLAY OPTION 

MOV BH, 00 7PAGE 0 

MOV Al, 44H 7;THE ASCIT FOR EERTTER ™D" 
MOV Cx 00n ;REPEAT IT 200H TIMES 

MOV BL, OFOH ;BLACK ON WHITE BLINKING 
iNT 10H 


(c) Reminder: DEBUG assumes that all the numbers are in hex. 


C>debug 

-A 

1131:0100 MOV 
1T3120102° MOV ,CHANGE THE VIDEO MODE 
IAS OTOA TENT 
TeS ROBO: MOV 
1131:0108. MOV 
1131:010A MOV 
MESI SO) OIG: WON 
TLS T OOE MOV 
TAES EAO TENT 
ISIR OLES INT 
113120114 


Now see the result by typing in the command -G. Make sure that IP = 100 before running it. 
As an exercise, change the BL register to other attribute values given earlier. For example, BL 
= 07 white on black, or BL = 87H white on black blinking. 


Graphics: pixel resolution and color 


In the text mode, the screen is viewed as a matrix of rows and columns of char- 
acters. In graphics mode, the screen is viewed as a matrix of horizontal and vertical pix- 
els. The number of pixels varies among monitors and depends on monitor resolution and 
the video board. In this section we show how to access and program pixels on the screen. 
Before embarking on pixel programming, the relationship between pixel resolution, the 


aaae 
134 


number of colors available, 
and the amount of video 


Table 4-1: The 16 Possible Colors 


memory in a given video I R G B Color 

board must be clarified. 0 0 0 0 black 

There are two facts associat- 0 0 0 1 blue 

ed with every pixel on the 9 0 1 0 green 

screen: (1) the location of the 9 0 1 1 cyan 

pixel, and (2) its attributes, 0 1 0 0 fed 

color and intensity. These ae ss eee 
pa eects: ast barstoredwi ae EA 
the video RAM. Therefore, | a 
the higher the number of pix- ee eee 
els and colors, the larger the JL 0 O 0 gray 0 
amount of memory that is 1 0 0 l light blue 

needed to store them. In 1 0 i 0 light green 

other words, the memory 1 0 J ji light cyan 

requirement goes up as the | 1 0 0 light red 

resolution and the number of J 1 0 1 light magenta 

colors on the monitor go up. 7 1 1 0 Silow 

MOD A ea o a O ae 


maximum of 16K bytes of 
video memory due to its 
inherent design structure. 
The 16K bytes of memory can be used in three different ways. 


1. Text mode of 80 x 25 characters; This takes a total of 2K bytes (80 x 25 = 2000) for 
the characters plus 2K bytes of memory for their attributes, since each character has 
one attribute byte. That means that each screen (frame) takes 4K bytes, and that results 
in CGA supporting a total of four pages of data, where each page represents one full 
screen. In this mode, 16 colors are supported. To select this mode, use AL = 03 for 
mode selection in INT 10H option AH = 00. 

2. Graphics mode of 320 x 200 (medium resolution); In this mode there are a total of 
64,000 pixels (320 columns x 200 rows = 64,000). Dividing the total video RAM 
memory of 128K bits (16K x 8 bits = 128K bits) by the 64,000 pixels gives 2 bits for 
the color of each pixel. These 2 bits give four possibilities. Therefore, the 320 x 200 
resolution CGA can support no more than 4 colors. To select this mode, use AL = 04. 

3. Graphics resolution of 640 x 200 (high resolution); In this mode there are a total of 
128,000 pixels (200 x 640 = 128,000). Dividing the 16K bytes of memory by this 
gives | bit (128,000/128,000 = 1) for color. The bit can be on (white) or off (black). 
Therefore, the 640 x 200 high-resolution CGA can support only black and white. To 
select this mode, use AL= 06. 


Example 4-4 


Write a program that puts 20H (ASCII space) on the entire screen. Use high-intensity white on 
a blue background attribute for any characters to be displayed. 


Solution: 


; SET MODE OPTION 
;CGA COLOR TEXT MODE OF 80 x 25 


; DISPLAY OPTION 

;PAGE 0 

;ASCII FOR SPACE 

;REPEAT IT 800H TIMES 

; HIGH-INTENSITY WHITE ON BLUE 


CHAPTER 4: INT 21H AND INT 10H PROGRAMMING AND MACROS 135 


From the discussion above one can conclude that with a fixed amount of video 
RAM, the number of supported colors decreases as the resolution increases. That is the 
reason that to create more colors in video boards there must be memory available to store 
the extra colors. 


INT 10H and pixel programming 


To address a single pixel on the screen, use INT 10H with AH = OCH. The X and 
Y coordinates of the pixel must be known. The values for X (column) and Y (row) vary, 
depending on the resolution of the monitor. The registers holding these values are CX = 
the column point (the X coordinate) and DX = the row point (Y coordinate). If the dis- 
play mode supports more than one page, BH = page number; otherwise, it is ignored. To 
turn the pixel on or off, AL = 1 or AL = 0 for black and white. The value of AL can be 
modified for various colors. 


Drawing horizontal or vertical lines in graphics mode 


To draw a horizontal line, choose values for the row and column to point to the 
beginning of the line and then continue to increment the column until it reaches the end 
of the line, as shown in Example 4-5. 


Example 4-5 


Write a program to: (a) clear the screen, (b) set the mode to CGA of 640 x 200 resolution, and 
(c) draw a horizontal line starting at column = 100, row = 50, and ending at column 200, row 50. 


Solution: 
MOV 
MOV 
MOV 
MOV 


AX, 0600H 
BH, 07 
CX, 0000 
DX, 184FH 


7 SCROLL THE SCREEN 
;NORMAL ATTRIBUTE 

7; FROM ROW=00, COLUMN=00 
;TO ROW=18H, COLUMN=4FH 


i 


i 


M 


T 
T 


NT 


MOV 
MOV 


NT 


MOV 
MOV 


OV 


MOV 


NT 
NC 


10H 
AH, 00 
AL, 06 
10H 
CX, 100 
DX, 50 
AH, OCH 
AL, O1 
10H 

cx 


; INVOKE INTERRUPT TO CLEAR SCREEN 
SET MODE 

;MODE = 06 (CGA HIGH RESOLUTION) 
; INVOKE INTERRUPT TO CHANGE MODE 
START LINE AT COLUMN =100 AND 
7;ROW = 50 

;AH=O0CH TO DRAW A LINE 

; PIXELS = WHITE 

; INVOKE INTERRUPT TO DRAW LINE 

; INCREMENT HORIZONTAL POSITION 


CMP 
JNZ 


CX, 200 
BACK 


; DRAW LINE UNTIL COLUMN = 200 


As an exercise, put INT 3 at the end of the program above and run it in DEBUG 
to get a feeling of the concept. To draw a vertical line, simply increment the vertical value 
held by the DX register and keep CX constant. The linear equation y = mx + b can be used 
to draw any line. 


Review Questions 


1. Interrupt 10H function calls perform what services? 
2. The monitor in text mode has columns and rows. The top left position 


is(__, __) and the bottom right position is(__,__). 
3. Fill in the blanks in the following program, which clears the screen. Write comments 
on each line stating the purpose of each line of code. 
MOV AH, 
MOV 
MOV 
MOV 
MOV 
MOV 
MOV 


Flee 


BH, 


CH, 


Clb, 


DH, 


DL, 


ENE VOH 


INT 10 function AH = 03 was used. Afterward, DH = 05 and DL = 34. What does 
this indicate? 
What is the purpose of the attribute byte for monochrome monitors? 
In text mode, there is one attribute byte associated with each on the screen. 
Write the attribute byte to display background green, foreground white blinking. 
State the purpose of the following program, which is for a monochrome monitor. 

MOV AH, 02 

MOV BH, 00 

MOV DxX,0000 

INT 10H 

MOV AH, 09 

MOV BH,00 

MOV AL, 2AH 

MOV CX, 80 

MOV BL, OFOH 

TAT TOH 


D 


Z on 


SECTION 4.2: DOS INTERRUPT 21H 


INT 21H is provided by DOS in contrast to INT 10H, which is BIOS-ROM based. 
When the OS is loaded into the computer, INT 21H can be invoked to perform some 
extremely useful functions. These functions are commonly referred to as DOS INT 21H 
function calls. A partial list of these options is provided in Appendix D. In this section we 
use only the options dealing with inputting information from the keyboard and displaying 
it on the screen. In previous chapters, a fixed set of data was defined in the data segment 
and the results were viewed in a memory dump. Starting with this chapter, data will come 
from the keyboard and after it is processed, the results will be displayed on the screen. 
This is a much more dynamic way of processing information and is the main reason for 
placing this chapter at this point of the book. Although data is input and output through 
the keyboard and monitor, there is still a need to dump memory to verify the data when 
troubleshooting programs. 


INT 21H option 09: outputting a string of data to the monitor 


INT 21H can be used to send a set of ASCII data to the monitor. To do that, the 
following registers must be set: AH = 09 and DX = the offset address of the ASCII data 
to be displayed. Then INT 21H is invoked. The address in the DX register is an offset 
address and DS is assumed to be the data segment. INT 21H option 09 will display the 
ASCII data string pointed at by DX until it encounters the dollar sign "$". In the absence 
of encountering a dollar sign, DOS function call 09 wiil continue to display any garbage 
that it can find in subsequent memory locations until it finds "$". For example, to display 
the message "The earth is but one country", the following is from the data segment and 
code segment. 


DATA _ASC DB "The earth is but one country','S' 
MOV AH, 09 poption 09 to display stming OE Ceia 
MOV DX,OFFSET DATA ASC ;DX= offset address of data 
INT 20A ¿invoke the interrupt 


INT 21H option 02: outputting a single character to the monitor 


There are occasions when it is necessary to output to the monitor only a single 
character. To do that, 02 is put in AH, DL is loaded with the character to be displayed, and 
then INT 21H is invoked. The following displays the letter "J". 


MOV AH, 02 pooruen O02 displays one character 
MOV DL AJ ;DL holds the character to be displayed 
INT A ALi ;invoke the interrupt 


m 
CHAPTER 4: INT 21H AND INT 10H PROGRAMMING AND MACROS 137 


This option can also be used to display '$' on the monitor since the string display 
option (option 09) will not display '$'. 


INT 21H option 01: inputting a single character, with echo 


This functions waits until a character is input from the keyboard, then echoes it to 
the monitor. After the interrupt, the input character will be in AL. 


MOV AH,01 ;option O01 inputs one character 
INT 22 Ue ;after the interrupt, AL = input character (ASCII) 


Program 4-1 combines INT 10H and INT 21H. The program does the following: 
(1) clears the screen, (2) sets the cursor to the center of the screen, and (3) starting at that 
point of the screen, displays the message "This is a test of the display routine". 


INT 21H option OAH: inputting a string of data from the keyboard 


Option OAH of INT 21H provides a means by which one can get data from the 
keyboard and store it in a predefined area of memory in the data segment. To do that, reg- 
isters are set as follows: AH = OAH and DX = offset address at which the string of data is 
stored. This is commonly referred to as a buffer area. DOS requires that a buffer area be 
defined in the data segment and the first byte specifies the size of the buffer. DOS will 
put the number of characters that came in through the keyboard in the second byte and the 
keyed-in data is placed in the buffer starting at the third byte. For example, the following 
program will accept up to six characters from the keyboard, including the return (carriage 
return) key. Six locations were reserved for the buffer and filled with FFH. The following 
shows portions of the data segment and code segment. 


ORG 0010H 


DATA1 DB 6,2, 0 DOS er) OOO OOL o QOL = m 
MOV AH, OAH FSLEING input Option or aN 
MOV DX,OFFSET DATA1 ;load the offset address of buffer 
INT 20H ;invoke interrupt 21H 


The following shows the memory contents of offset 0010H: 


DOTS COLL 0012 0013 C014 0015 0016 D017 
06 00 ER EE EEJ PE ER EE 


When this program is executed, the computer. waits for the information to come 
in from the keyboard. When the data comes in, the IBM PC will not exit the INT 21H rou- 
tine until it encounters the return key. Assuming the data that was entered through the 
keyboard was "USA" <RETURN>, the contents of memory locations starting at offset 
0010H would look like this: 


OO10 OCOL CQL 0013 0014 0015 O0lE Oni 
06 05 35 39 41 OD FF Jee 
USACR 


The following is a step-by-step analysis: 


0010H=06 DOS requires the size of the buffer in the first location. 

0011H=03 The keyboard was activated three times (excluding the RETURN key) to 
key in the letters U, S, and A. 

0012H =55H_ This is the ASCII hex value for letter U. 

0013H=53H This is the ASCII hex value for letter S. 

0014H=41H This is the ASCII hex value for letter A. 

0015H =0DH This is the ASCII hex value for CR (carriage return). 


eee 
138 


PROG4-1 SIMPLE DISPLAY PROGRAM 
60, 132 
-MODEL SMALL 


MAIN PROC 
MOV AX,@DATA 
MOV DS,AX 
CANE CLEAR ;CLEAR THE SCREEN 
Chi. CURSOR ;SET CURSOR POSITION 
CMLL DISPRAY ;DISPLAY MESSAGE 
MOV 
INT ;GO BACK TO DOS 
MAIN ENDP 


AHIS SUBROUTINE CLEARS THE SCREEN 

CLEAR PROC 
MOV AX, 0600H 7;SCROLL SCREEN FUNCTION 
MOV BH O7 ;: NORMAL ATTRIBUTE 
MOV CX, 0000 ; SCROLL FROM ROW=00,COL=00 


MOV DX, 184FH ; TO ROW=18H, COL=4FH 
INT ; INVOKE INTERRUPT TO CLEAR SCREEN 
RET 

CLEAR ENDP 


Pow SUBROULINE SETS THE CURSOR AT THE CENTER OF THE SCREEN 
CURSOR PROC 
MOV AH, 02 7;SET CURSOR FUNCTION 
MOV BH, 00 ;PAGE 00 
MOV iD} sipaa ; CENTER ROW 
MOV DLT 39 ; CENTER COLUMN 
INT 10H ; INVOKE INTERRUPT TO SET CURSOR POSITION 
RET 
CURSOR ENDP 


;THIS SUBROUTINE DISPLAYS A STRING ON THE SCREEN 
DISPLAY PROC 
MOV AH, 09 ;DISPLAY FUNCTION 
MOV DX,OFFSET MESSAGE ;DX POINTS TO OUTPUT BUFFER 
INT 72 link INVOKE INTERRUPT sO bits EitAe SETRTING 
RET 
DISPLAY T ENDE 
END MAIN 


Program 4-1 


One might ask where the value 03 in 0011H came from. DOS puts that value there 
to indicate that three characters were entered. How can this character count byte be 
accessed? See the following: 


MOV AH, OAH 
MOV DX, OFFSET DATA1 
INT Z Aol 


Heer errr —————— 
CHAPTER 4: INT 21H AND INT 10H PROGRAMMING AND MACROS 139 


;After data has been keyed in, next fetch the count value 
MOV BX,OFFSET DATAT 
SUB (Chal, Chal 7 CH=00 
MOV (Gib,-1| IBSS)| edb "Move Coun Eo. Cl 


To locate the CR value ODH in the string and replace it, say, with 00, simply code 
the following line next: 


MOV SI,CX 
MOV BYTE PTR BX+SI] +2,00 


The actual keyed-in data is located beginning at location [BX]+2. 
Inputting more than the buffer size 


Now what happens if more than six characters (five, the maximum length + the 
CR = 6) are keyed in? Entering a message like "USA a country in North America" 
<RETURN?> will cause the computer to sound the speaker and the contents of the buffer 
will look like this: 


OCOLO COLL O02 “OUTS 0014 0 0016 0017 
06 05 55 53 41 20 61 OD 
U S A SP a CR 


Location 0015 has ASCII 20H for space, and 0016 has ASCII 61H for "a", and 
finally, the OD for RETURN key is at 0017. The actual length is 05 at memory offset 
0011H. Another question is: What happens if only the CR key is activated and no other 
character is entered? For example, in the following, 


ORG 20H 
DATA4 DB 107. LO DUP {EE) 


which puts OAH in memory 0020H, the 0021H is for the count and the 0022H is the first 
location that will have the data that was entered. So if only the return key is activated, 
0022H has ODH, the hex code for CR. 


0020 0021 0022 0023 0024 0025 0026 0027 0028 0029 0028 0023 002E 
OA 00 OD BE, Pe a FF FF FF EF EE EE FF 


The actual number of characters entered is 0 at location 0021. Remember that CR 
is not included in the count. It must be noted that as data is entered it is displayed on the 
screen. This is called an echo. So the OAH option of INT 21H accepts the string of data 
from the keyboard and echoes (displays) it on the screen as it is keyed in. 


Use of carriage return and line feed 


In Program 4-2, the EQU statement is used to equate CR (carriage return) with its 
ASCII value of 0DH, and LF (line feed) with its ASCII value of OAH. This makes the pro- 
gram much more readable. Since the result of the conversion was to be displayed in the 
next line, the string was preceded by CR and LF. In the absence of CR the string would 
be displayed wherever the cursor happened to be. In the case of CR and no LF, the string 
would be displayed on the same line after it had been returned to the beginning of the line 
by the CR and, consequently, would write over some of the characters on that line. 


Program 4-3 prompts the user to type in a name. The name can have a maximum 
of eight letters. After the name is typed in, the program gets the length of the name and 
prints it to the screen. 


Program 4-4 demonstrates many of the functions described in this chapter. 


aeaaaee 
140 


ee -Zepertorms the following: (1) clears the screen, (2) sets 
;the cursor at the beginning of the third line from the top of the 


screen, (3) accepts the message "IBM perSonal COmputer" from the 
; keyboard, (4) converts lowercase letters of the message to uppercase, 
7 (5) displays the converted results on the next line. 
ICIS PROG4-2 
PAGE 607132 
.MODEL SMALL 
- STACK 64 
. DATA 
BUFFER DB P59 122 2 2 NOONE (2) ;BUFFER FOR KEYED-IN DATA 
ORG 18H 
DATAREA DB Cig, ILI Ae IDOE (2) Vis ; DATA HERE AFTER CONVERSION 
;DTSEG ENDS 
CR EQU ODH 
LF EQU UAH 
CODE 
MAIN PROC FAR 
MOV AX, @DATA 
MOV DS, AX 
CALL CLEAR CLEAR THE SCREEN 
CALL CURSOR ASiat CURSOR IOS ia WON 
CALL GETDATA ; INPUT A STRING INTO BUFFER 
CALL CONVERT ;CONVERT STRING TO UPPERCASE 
CALL DISPLAY ;DISPLAY STRING DATAREA 
MOV AH, 4CH 
INT 21H 7GO BACK TO DOS 
MAIN ENDP 


;THIS SUBROUTINE CLEARS THE SCREEN 


CLEAR PROC 
MOV AX, 0600H 7SCROLL SCREEN FUNCTION 
MOV BH, 07 ; NORMAL ATTRIBUTE 
MOV CXT 0000 ; SCROLL FROM ROW=00,COL=00 
MOV DX, 184FH ;TO ROW=18H, 4FH 
INT 10H ; INVOKE INTERRUPT TO CLEAR SCREEN 
RET 

CLEARENDP 


TARS SUBROUTINE SETS THE CURSOR TO THE BEGINNING OF THE 3RD LINE 


CURSOR PROC 


MOV AH, 02 SETT CURSORMEUNC TION 

MOV BH, 00 ;PAGE 0 

MOV IDL, Oil ;COLUMN 1 

MOV DH, 03 7;ROW 3 

INT 10H ; INVOKE INTERRUPT TO SET CURSOR 
RET 


CURSOR ENDP 


; THIS SUBROUTINE DISPLAYS A STRING ON THE SCREEN 
DISPLAY PROC 


MOV AH, 09 ;DISPLAY STRING FUNCTION 
MOV DX,OFFSET DATAREA ;DX POINTS TO BUFFER 
INT 21H ; INVOKE INTERRUPT TO DISPLAY STRING 
RET 
DlS 2A ENDP 


Program 4-2 (continued on the next page) 


Hee ree nee errr ——L— 


CHAPTER 4: INT 21H AND INT 10H PROGRAMMING AND MACROS 141 


THIS SUBROUTINE PUTS DATA FROM THE KEYBOARD INTO A BUFFER 
GETDATA PROC 

MOV AH, OAH ; INPUT STRING FUNCTION 

MOV DX OFRESENIBUFREER IDPS IEOVUINGIS) INO) BUN ISTE. 
INT 20H ; INVOKE INTERRUPT TO INPUT STRING 
RET 
GETDATA ENDE 


;THIS SUBROUTINE CONVERTS ANY SMALL LETTER TO ITS CAPITAL 

CONVERT PROC 
MOV BX, OFFSET BUFFER 
MOV CTBI I 7;GET THE CHAR COUNT 
SUB CHACE 7;CX = TOTAL CHARACTER COUNT 
MOV DI; CX 7; INDEXING INTO BUFFER 
MOV IIT, EMER BRIDE) SEZ PAH 7REPLACE CR WITH SPACE 
MOV SORE OLE DATAREATZ2 ;STRING ADDRESS 

AGAIN: MOV AL,{ BX] +2 ;GET THE KEYED-IN DATA 
EME AL, 61H CHECKE IOI MaN 
JB NEXT 7;IF BELOW, GO TO NEXT 
CMP AL, 7AH PCisbHel IOI rau 
JA NEXT IF ABOVE GO TO NEXT 
AND AL, LIOTTA pis ;CONVERT TO. CAPITAL 

MOV [STI] ,AL ; PLACE IN DATA AREA 

INC SE ; INCREMENT POINTERS 
INC BX 
LOOP AGAIN ;LOOP IF COUNTER NOT ZERO 
RET 

CONVERT ENDP 
END MAIN 


Program 4-2 (continued from the preceding page) 
INT 21H option 07: keyboard input without echo 


Option 07 of INT 21H requires the user to enter a single character but that char- 
acter is not displayed (or echoed) on the screen. After execution of the interrupt, the PC 
waits until a single character is entered and provides the character in AL. 


MOV AH,07 ;keyboard input without echo 
INT Zor 


Using the LABEL directive to define a string buffer 


A more systematic way of defining the buffer area for the string input is to use the 
LABEL directive. The LABEL directive can be used in the data segment to assign multi- 
ple names to data. When used in the data segment it looks like this: 


name LABEL attribute 


The attribute can be either BYTE, WORD, DWORD, FWORD, QWORD, or 
TBYTE. Simply put, the LABEL directive is used to assign the same offset address to 
two names. For example, in the following, 


JOE LABEL BYTE 
TOM DB 20 DUP(0) 


the offset address assigned to JOE is the same offset address for TOM since the LABEL 


directive does not occupy any memory space (see Appendix C for many examples of the 
use of the LABEL directive). 


——— ees 
142 


PROG4-3 READS IN LAST NAME AND DISPLAYS LENGTH 
60732 

-MODEL SMALL 

-STACK 64 (?) 


MESSAGE1 ‘What is your last name?','S' 
20H 
BUFFER1 omeo R DUE (0) 
30H 
MESSAGE2 CR,LF,'The number of letters in your name is: ','$!' 
ROW 
COLUMN 
; EQUATE CR WITH ASCII CODE FOR CARRIAGE RETURN 
;EQUATE LE WITH ASCII CODE FOR LINE FEED 


FAR 
AX, @DATA 
DS, AX 
CLEAR 
CURSOR 
AH, 09 ;DISPLAY THE PROMPT 
DX,OFFSET MESSAGE1 
21H 
AH, OAH ;GET LAST NAME FROM KEYBOARD 
DX,OFFSET BUFFER1 
21H 
BX,OFFSET BUFFER1 ;FIND OUT NUMBER OF LETTERS IN NAME 
CL,[ BX+1] ;GET NUMBER OF LETTERS 
CL, 30H ;MAKE IT ASCII 
MESSAGE2+40, CL ;PLACE AT END OF STRING 
AH, 09 ; DISPLAY SECOND MESSAGE 
DX,OFFSET MESSAGE2 
21H 
AH, 4CH 
;GO BACK TO DOS 


CLEAR PROC ;CLEAR THE SCREEN 
MOV AX,0600H 
MOV BH,07 
MOV CX, 0000 
MOV DX,184FH 
INT 
RET 
CLEAR ENDP 


CURSOR PROC ; SET CURSOR POSITION 
MOV AH, 02 
MOV BH, 00 
MOV DL, COLUMN 
MOV DH, ROW 
INT 10H 
RET 
CURSOR ENDP 
END MAIN 


Program 4-3 


secs, 
CHAPTER 4: INT 21H AND INT 10H PROGRAMMING AND MACROS 143 


Write a program to perform the following: (1) clear the screen, (2) set the cursor at row 5 and 
column 1 of the screen, (3) prompt "There is a message for you from Mr. Jones. To read it enter 
Y ". If the user enters 'Y' or 'y' then the message "Hi! I must leave town tomorrow, therefore I will 
not be able to see you" will appear on the screen. If the user enters any other key, then the prompt 
"No more messages for you" should appear on the next line. 


TITLE PROGRAM 4-4 
PAGE 60,132 

-MODEL SMALL 
-STACK 64 


PROMPT1 "There is a message for you from Mr. Jones. 


DB Une; iaeievel alge, enteri ie, ma 
MESSAGE DB CR,LF,'Hi! I must leave town tomorrow, ' 
DB "therefore IT will not be able tomse> yout, a. 


PROMPT2 CR, LF, 'No more messages for you','$' 


MAIN PROC 
MOV AX, @DATA 
MOV DS, AX 


CALL CLEAR ;CLEAR THE SCREEN 
CALL CURSOR Lob CURSOR EOS ELION 
MOV AH, 09 ;DISPLAY THE PROMPT 
MOV DX,OFFSET PROMPT1 
INT 21H 
MOV AH, 07 GEL ONE CHAR, NO ECHO 
INT 2 Isl 
CMP ADAY" TARY PE CONTINUE 
JZ OVER 
CMP AL, 'y' 
JZ OVER 
MOV AH, 09 ;DISPLAY SECOND PROMPT IF NOT Y 
MOV DX,OFFSET PROMPT2 
IDNR 2 Alls! 
JMP E 
OVER: MOV AH, 09 ,;DISPLAY THE MESSAGE 


DX,OFFSET MESSAGE 


7GO BACK TO DOS 


CLEAR PROC ;CLEARS THE SCREEN 
MOV AX, 0600H 
MOV BH, 07 
MOV CX, 0000 
DX, 184FH 


CURSOR PROC 7 okt CURSOR POST LION 
MOV AH, 02 
MOV BH, 00 


MOV DOS ; COLUMN 5 
MOV DH, 08 7ROW 8 
INT 10H 
RET 

CURSOR ENDP 


END MAIN 


Program 4-4 


ee L 


144 


TITLE PROGRAM 4-4 REWRITTEN WITH FULL SEGMENT DEFINITION 
IBYNGIE, (0), ALS 
STSEG SEGMENT 

DB 64 DUP (?) 


STSEG ENDS 

DTSEG SEGMENT 

PROMPT 1 DB "There is a message for you from Mr. Jones. ' 
DB 'To read it enter Y','S' 

MESSAGE DB CR,LF,'Hi! I must leave town tomorrow, ' 
DB "therefore I will not be able to see you','$' 

PROMPT2 DB CR,LF,'No more messages for you','S'! 

DTSEG ENDS 

ER EOU ODH 

LF EQU OAH 


CDSEG SEGMENT 

MAIN PROC FAR 
ASSUME CS \CDSEG, DS: DISEG, SS: SLSEG 
MOV AX, DTSEG 
MOV DS, AX 


CALL CHEAR ;CLEAR THE SCREEN 
CALL CURSOR ;SET CURSOR POSITION 
MOV AH,09 ;DISPLAY THE PROMPT 
MOV  DX,OFFSET PROMPT1 
INT 21H 
MOV AH,07 ;GET ONE CHAR, NO ECHO 
INT 21H 
CMP AL,'yY' ;IF 'Y', CONTINUE 
JZ OVER 
CMP AL,'y' 
az OVER 
MOV AH,09 ;DISPLAY SECOND PROMPT IF NOT Y 
MOV DX,OFFSET PROMPT2 
INT 21H 
JMP © EXIT 

OVER: MOV AH,09 ;DISPLAY THE MESSAGE 
MOV DX,OFFSET MESSAGE 
INT 21H 

EXIT: MOV AH, 4CH 
INT 21H ;GO BACK TO DOS 

MAIN ENDP 

CLEAR PROC ;CLEARS THE SCREEN 


MOV AX, 0600H 
MOV level, (O) 7) 
MOV CX, 0000 
MOV DX, 184FH 


INT 10H 
RET 
CLEAR ENDP 
CURSOR PROC ASH. “CURSOR POSITION 


MOV AH, 02 
MOV BH, 00 


MOV DIOS COLUMN 5 
MOV DH, 08 7ROW 8 
INT TOT 
RET 

CURSOR ENDP 

EDSEG ENDS 
END MAIN 


Program 4-4 (rewritten with full segment definition) 


meee eee eee en VS 
CHAPTER 4: INT 21H AND INT 10H PROGRAMMING AND MACROS 145 


Next we show how to use this directive to define a buffer area for the string key- 
board input: 


DATA BUF LABEL BYTE 
MAX SIZE DB 10 

BUF COUNT DB ? 

BUF AREA DB 10 DUP (20H) 


Now in the code segment the data can be accessed by name as follows: 


MOV AH, OAH oad iS Ering INECO DUELEN 
MOV DX,OFFSET DATA BUF 
INT 2 ALI 


MOV CL, BUF COUNT; load the actual length of string 

MOV SI,OFFSET BUF AREA;SI=address of first byte of string 

This is much more structured and easier to follow. By using this method, it is easy 
to refer to any parameter by its name. For example, using the LABEL directive, one can 
rewrite the CONVERT subroutine in Program 4-2 as follows: 

;In that data segment the BUFFER is redefined as 
BUFFER LABEL BYTE 


BUFSIZE DB 22 
BUFCOUNT DB 2 
REALDATA DB 22) DO(S Y) 


;and in the code segment, in place of the CONVERT procedure: 
CONVERT PROC 

MOV CL,BUFCOUNT;load the counter 

DUB. CH, C HF Cx—counter 

MOV DI,CX;index into data field 

MOV BX,OFFSET REALDATA ;actual data address in buffer 

MOV BYTE PTR BX+DI] ,20H ;replace the CR with space 

MOV SI,OFFSET DATAREA ;SIl=address of converted data 


AGAIN:MOV AL,[ BX] ;move the char into AL 
CMP AL, 61H ;check if is below '‘'a' 
JB NEXT rif yes then go to next 
CMP AL, 7AH ;check for above '‘'z' 
JA NEXT ;if yes then go to next 
AND AL; TIOIIIIIB Fif not then mask Gt Ge capital 
NEXT: MOV [ SI] , AL ;move the character 
INCEST ¡increment the pointer 
INC BX ¡increment the pointer 
LOOP AGAIN ;repeat if CX not zero yet 
RET ;return to main procedure 


CONVERT ENDP 
Review Questions 


1. INT function calls reside in ROM BIOS, whereas INT __ function calls are 
provided by DOS. 
2. What is the difference between the following two programs? 
MOV AH,09 MOV AH, OAH 
MOV DX,OFFSET BUFFER MOV DX,OFFSET BUFFER 
INT 21H INT 21H 


3. INT 21H function 09 will display a string of data beginning at the location specified 
in register DX. How does the system know where the end of the string is? 
4. Fill in the blanks to display the following string using INT 21H. 


MESSAGE1 DB "What isi your Vast namers* 
MOV AH, 
MOV IDPS 
INT 2H 


146 


5. The following prompt needs to be displayed. What will happen if this string is out- 
put using INT 21H function 09? 


PROMPT1 DB ‘Enter (to nearest $) your annual salary' 
6. Use the EQU directive to equate the name "BELL" with the ASCII code for sounding 
the bell. 


7. Write a program to sound the bell. 
8. Code the data definition directives for a buffer area where INT 21H Option OAH will 
input a social security number. 


SECTION 4.3: WHAT IS A MACRO AND HOW IS IT USED? 


There are applications in Assembly language programming where a group of 
instructions performs a task that is used repeatedly. For example, INT 21H function 09H 
for displaying a string of data and function OAH for keying in data are used repeatedly in 
the same program. So it does not make sense to rewrite them every time they are needed. 
Therefore, to reduce the time that it takes to write these codes and reduce the possibility 
of errors, the concept of macros was born. Macros allow the programmer to write the task 
(set of codes to perform a specific job) once only and to invoke it whenever it is needed, 
wherever it is needed. 


MACRO definition 


Every macro definition must have three parts, as follows: 


name MACRO dummyl,dummy2,...,dummyN 
ENDM 

The MACRO directive indicates the beginning of the macro definition and the 
ENDM directive signals the end. What goes in between the MACRO and ENDM direc- 
tives is called the body of the macro. The name must be unique and must follow Assembly 
language naming conventions. The dummies are names, or parameters, or even registers 
that are mentioned in the body of the macro. After the macro has been written, it can be 
invoked (or called) by its name, and appropriate values are substituted for dummy param- 


eters. Displaying a string of data using function 09 of INT 21H is a widely used service. 
The following is a macro for that service: 


STRING MACRO DATA1 
MOV AH, 09 
MOV DX,OFFSET DATA1 
INT 21H 
ENDM 


The above code is the macro definition. Note that dummy argument DATA] is 
mentioned in the body of the macro. In the following example, assume that a prompt has 
already been defined in the data segment as shown below. In the code segment, the macro 
can be invoked by its name with the user's actual data: 

MESSAGE1 DB ‘What is your name?','S' 
STRING MESSAGE1 


The instruction "STRING MESSAGE1" invokes the macro. 


—— ss caaT 
CHAPTER 4: INT 21H AND INT 10H PROGRAMMING AND MACROS 147 


The assembler expands the macro by providing the following code in the .LST 
file: 


1 MOV AH, 09 
1 MOV DX, OBHSE MESSAGE 
J WIN 20H 


The (1) indicates that the code is from the macro. In earlier versions of MASM, 
a plus sign (+) indicated lines from macros. 
Example 4-6 demonstrates the macro definition. 


Example 4-6 


Write macro definitions for setting the cursor position, displaying a string, and clearing the 
screen. 


Solution: 


CURSOR MACRO ROW, COLUMN 

ARRATS MACRO SETS THE CURSOR LOCATION TO ROW, COLUMN 
7;USING BIOS INT 10H FUNCTION 02 

MOV AH, 02 7;SET CURSOR FUNCTION 

MOV BH, 00 ; PAGE 00 

MOV DH, ROW ;ROW POSITION 

MOV DL, COLUMN COLUMN. POSITION 

INT 10H 7; INVOKE THE INTERRUPT 

ENDM 


DISPLAY MACRO STRING 

7; THIS MACRO DISPLAYS A STRING OF DATA 

;7DX = ADDRESS OF STRING.» USES FUNCTION O09 INT 21H. 
MOV AH, 09 DISPLAY STRING FUNCTION 

MOV DX,OFFSET STRING ;DX = OFFSET ADDRESS OF DATA 
INT 21H ; INVOKE THE INTERRUPT 

ENDM 


CLEARSCR MACRO 

7; THIS MACRO CLEARS THE SCREEN 

;/ USING OPTION 06 OF INT 10H 

MOV AX,0600H 7 SCROLL SCREEN FUNCTION 
MOV BH, 07 ; NORMAL ATTRIBUTE 

MOV CX, 0 J FROM ROW=00, COLUMN=00 
MOV DX,184FH 7TO ROW=18H, COLUMN=4FH 
INT 10H 7 INVOKE THE INTERRUPT 
ENDM 


Remember that the comments marked with ";;" will not be listed in the list file as seen in the list 
file for Program 4-5. 


Comments in a macro 


Now the question is: Can macros contain comments? The answer is yes, but there 
is a way to suppress comments and make the assembler show only the lines that generate 
opcodes. There are basically two types of comments in the macro: listable and non- 
listable. If comments are preceded by a single semicolon (;) as is done in Assembly lan- 
guage programming, they will show up in the ".Ist" file, but if comments are preceded by 
a double semicolon (;;) they will not appear in the ".Ist" file when the program is assem- 
bled. There are also three directives designed to make programs that use macros more 
readable, meaning that they only affect the ".Ist" file and have no effect on the "obj" or 


ee 
148 


Using the macro definition in Example 4-6, write a program that clears the screen and then at 
each of the following screen locations displays the indicated message: 


at row 2 and column 4 “My name” at row 12 and column 44 “What is” 

at row 7 and column 24 “is Joe” at row 19 and column 64 “your name?” 
TT Dae PROG4-5 
PAGE GOMES 2 


CLEARSCR MACRO 
7; THIS MACRO CLEARS THE SCREEN 
rpUomeG OPTION 06 OF INT 10H 


MOV AX, 0600H 7;SCROLL SCREEN FUNCTION 
MOV BEHT ON ; NORMAL ATTRIBUTE 
MOV CoO 7 FROM ROW=00, COLUMN=00 
MOV DX, 184FH ; TO ROW=18H, COLUMN=4FH 
TNT 10H 7; INVOKE THE INTERRUPT 
ENDM 

DISPLAY MACRO STRING 


7 THIS MACRO DISPLAYS A STRING OF DATA 
a) ar Dit SOLO TRING MUSES EUNCETON 09 ENT 21H. 


MOV AH, 09 ;DISPLAY STRING FUNCTION 
MOV DX,OFFSET STRING ;DX = OFFSET ADDRESS OF DATA 
INT 20g ; INVOKE THE INTERRUPT 
ENDM 
CURSOR MACRO ROW, COLUMN 


7THIS MACRO SETS THE CURSOR LOCATION TO ROW, COLUMN 
7,UScING BIOS INT 10H FUNCTION 02 


MOV AH, 02 ;SET CURSOR FUNCTION 
MOV BH OO ; PAGE 00 

MOV DH, ROW 7ROW POSITION 

MOV DL, COLUMN 7;COLUMN POSITION 

INT 10H ; INVOKE THE INTERRUPT 
ENDM 


-MODEL SMALL 


.STACK 64 

. DATA 
MESSAGE1 DB “My name `,'$' 
MESSAGE2 DB “is Joe','S' 
MESSAGE3 DB “What is °*,'S$' 
MESSAGE4 DB “your name?','S$' 

. CODE 


MAIN PROC FAR 
MOV AX, @DATA 
MOV DS, AX 


. LALL AIG ItSae AMLlls 

CLEARSCR ; INVOKE CLEAR SCREEN MACRO 
CURSORE, 4 Jomo CURSOR TO ROW2Z,COL 2 
DISPLAY MESSAGE1 ;INVOKE DISPLAY MACRO 

-XALL ;LIST ALL EXECUTABLE 

CURSOR 7,24 HeblCURSOR TO ROW 7,COL 24 
DISPLAY MESSAGE2 ;INVOKE DISPLAY MACRO 

. SALL ;SUPPRESS ALL 

CURSOR 12,44 ‘oi tecuURsOR TTO ROW 12,COL 44 
DISPLAY MESSAGE3 ;INVOKE DISPLAY MACRO 

CURSOR 19,64 Poo tecURSOR, TO ROW 19,COL 64 
DISPLAY MESSAGE4 ;INVOKE DISPLAY MACRO 


MOV AH, 4CH 


MT 21H ;GO BACK TO DOS 
MAIN ENDP 

END MAIN 
Program 4-5 


eee eee eee eee 


CHAPTER 4: INT 21H AND INT 10H PROGRAMMING AND MACROS 149 


"exe" files. They are as follows: 

LALL (List ALL) will list all the instructions and comments that are preceded by 
a single semicolon in the "Ist" file. The comments preceded by a double semicolon can- 
not be listed in the ".Ist" file in any way. 

‚SALL (Suppress ALL) is used to make the list file shorter and easier to read. It 
suppresses the listing of the macro body and the comments. This is especially useful if 
the macro is invoked many times within the same program and there is no need to see it 
listed every time. It must be emphasized that the use of .SALL will not eliminate any 
opcode from the object file. It only affects the listing in the "Ist" file. 

-XALL (eXecutable ALL), which is the default listing directive, is used to list 
only the part of the macro that generates opcodes. 


Analysis of Program 4-5 


Compare the ".asm" and ".Ist" files to see the use of LLALL, .XALL, and .SALL. 
The .LALL directive was used for each macro and then .XALL was used for two of them, 
From then on, all were suppressed. 


(R) Macro Assembler Version 5.10 1/13/92 00:17:15 
Page 1-1 
TITLE PROG4-5 ‘ 
PAGE 60,132 


CLEARSCR MACRO 
7; THIS MACRO CLEARS THE SCREEN 
77;USING OPTION 06 OF INT 10H 
MOV AX,0600H ;SCROLL SCREEN FUNCTION 
MOV BH,07 ; NORMAL ATTRIBUTE 
Cx, 0 7FROM ROW=00, COLUMN=00 
DX, 184FH ; TO ROW=18H, COLUMN=4FH 
7 INVOKE THE INTERRUPT 


DISPLAY MACRO STRING 

;/ THIS MACRO DISPLAYS A STRING OF DATA 

Ar DXI = ADDRESS OF STRING. USES FUNCTION G9 iNae Zine 
MOV AH,09 ;DISPLAY STRING FUNCTION 
MOV DX,OFFSET STRING ;DX = OFFSET ADDRESS OF DATA 
INT 21H 7; INVOKE THE INTERRUPT 


7;THIS MACRO SETS THE CURSOR LOCATION TO ROW, COLUMN 
;;USING BIOS INT 10H FUNCTION 02 

MOV AH, 02 *;SET CURSOR FUNCTION 

MOV BH, 00 7;PAGE 00 

MOV DH, ROW 7ROW POSITION 

MOV DL, COLUMN 7COLUMN POSITION 

TINE ILORI ; INVOKE THE INTERRUPT 

ENDM 


List File for Program 4-5 (continued on the next page) 


LOCAL directive and its use in macros 


In the discussion of macros so far, examples have been chosen that do not have a 
label or name in the body of the macro. This is because if a macro is expanded more than 
once in a program and there is a label in the label field of the body of the macro, these 
labels must be declared as LOCAL. Otherwise, an assembler error would be generated 
when the same label was encountered in two or more places. 


eee 
150 


Microsoft (R) Macro Assembler Version 5.10 1/13/92 00:17:15 
PROG4-5 


-MODEL SMALL 
- STACK 64 


4p) 79° 20 GE 61 
Some 0 24 

69 73 20 4A 6F 
orme GE 74 20 
7S 20 24 

TS OE 75. 12 20 
SL GD 65 Sig Au 


6D MESSAGE1 


65 
69 


MESSAGE2 
MESSAGE3 


6E MESSAGE4 


BROC 
MOV 
MOV 
- LALL 
CLEARSCR 


FAR 


DS, AX 


ous Joel, S! 
“What is 


ALSS 


“your name?','S$! 


AX, @DATA 


;LIST ALL 
; INVOKE CLEAR SCREEN MACRO 


; THIS MACRO CLEARS THE SCREEN 


r 


MOV 
MOV 
MOV 
MOV 
INT 


AX, 0600H 
BPOM 
C 

DX, 184FH 
10H 


;SCROLL SCREEN FUNCTION 
;NORMAL ATTRIBUTE 

; FROM ROW=00, COLUMN=00 
; TO ROW=18H, COLUMN=4FH 
; INVOKE THE INTERRUPT 


CURSOR 2,4 ;CURSOR MACRO WILL SET CURSOR 
;THIS MACRO SETS THE CURSOR LOCATION 


r 


MOV 
MOV 
MOV 


AH, 02 

BH, 00 

DHP 

MOV DL,4 

INT 10H 

DISPLAY MESSAGE1 


SET CURSOR FUNCHION 
;PAGE 00 

; ROW POSITION 

; COLUMN POSITION 

; INVOKE THE INTERRUPT 
; INVOKE DISPLAY MACRO 


BAUBLES) JMVNCINO) IDIL SSIES) AN PS IUSILING (Om! IDV ITA 


; 


MOV AH,09 


JNM Zig 

. XALL 
CURSOR 7,24 
MOV AH,902 
MOV BH, 00 
MOV DH,7 
MOV DL,24 
PNER TOH 


;DISPLAY STRING FUNCTION 
MOV DX,OFFSET MESSAGE1 


7 DX OFFSETRADER 

; INVOKE THE INTERRUPT 
;LIST ALL EXECUTABLE 

; SET CURSOR TO ROW=7,COL= 
;SET CURSOR FUNCTION 


; PAGE 00 

7;ROW POSITION 

;COLUMN POSITION 

; INVOKE THE INTERRUPT 


DISPLAY MESSAGE2 ; INVOKE DISPLAY MACRO 
MOV AH, 09 ;DISPLAY STRING FUNCTION 
MOV DX,OFFSET MESSAGE2 ;DX = OFFSET ADDR 
INT 21H ; INVOKE THE INTERRUPT 
SARK ; SUPPRESS ALL 

CURSOR 12,44 ;SET CURSOR TO 

DISPLAY MESSAGE3 ; INVOKE DISPLAY MACRO 
CURSOR 19,64 ;SET CURSOR TO 

DISPLAY MESSAGE4 ; INVOKE DISPLAY MACRO 
MOV AH, 4CH 
ENE 21H 
ENDP 

END MAIN 


;GO BACK TO DOS 


List File for Program 4-5 (continued from the preceding page) 
TC er eee ee een eee 


CHAPTER 4: INT 21H AND INT 10H PROGRAMMING AND MACROS 151 


The following rules must be observed in the body of the macro: 


All labels in the label field must be declared LOCAL. 

2. The LOCAL directive must be right after the MACRO directive. In other words, it 
must be placed even before comments and the body of the macro; otherwise, the 
assembler gives an error. 

3. The LOCAL directive can be used to declare all names and labels at once as follows: 


r 


LOCAL namel,name2,name3 
or one at a time as: 

LOCAL namel 

LOCAL name2 

LOCAL name3 


To clarify these points, look at Example 4-7. 


Example 4-7 


Write a macro that multiplies two words by repeated addition, then saves the result. 


Solution: 


The following macro can be expanded as often as desired in the same program since the 
label "back" has been declared as LOCAL. 


MULTIPLY MACRO VALUE1L, VALUE2, RESULT 
LOCAL BACK 

; THIS MACRO COMPUTES RESULT = VALUE1 X VALUE2 

ri BY REPEATED ADDITION 

; ;VALUE1 AND VALUE2 ARE WORD OPERANDS; RESULT IS A DOUBLEWORD 
MOV BX, VALUE1 7 BX=MULTIPLIER 
MOV CX, VALUE2 7 CX=MULTIPLICAND 
SUB AX, AX *;CLEAR AX 
MOV DX, AX CLEAR DX 
ADD AX, BX 7ADD BX TO AX 
ADC DX, 00 7ADD CARRIES IF THERE IS ONE 
LOOP BACK ; CONTINUE UNTIL CX=0 
MOV RESULT, AX ; SAVE THE LOW WORD 
MOV RESULT+2,DX ;SAVE THE HIGH WORD 
ENDM 


Notice in Example 4-7 that the "BACK" label is defined as LOCAL right after the 
MACRO directive. Defining this anywhere else causes an error. The use of a LOCAL 
directive allows the assembler to define the labels separately each time it encounters them. 
The list file below shows that when the macro is expanded for the first time, the list file 
has "??0000". For the second time it is "270001", and for the third time it is "220002" in 
place of the "BACK" label, indicating that the label "BACK" is local. To clarify this con- 
cept, try Example 4-7 without the LOCAL directive to see how the assembler will give an 
error. 


INCLUDE directive 


Assume that there are several macros that are used in every program. Must they 
be rewritten every time? The answer is no if the concept of the INCLUDE directive is 
known. The INCLUDE directive allows a programmer to write macros and save them in 
a file, and later bring them into any file. For example, assume that the following widely 
used macros were written and then saved under the filename "MYMACRO1.MAC". 


eee 
152 


Use the macro definition in Example 4-7 to write a program that multiplies the following: 
(1) 2000 x 500 (2) 2500 x 500 (3) 300 x 400 


TITLE PROG4-6 
PACERS OISZ 
MULTIPLY MACRO VALUE1, VALUE2, RESULT 
LOCAL BACK 
pio ACR COMPURES RESULT = VALUE] X VALUE2 
7;BY REPEATED ADDITION 
;7VALUE1 AND VALUE2 ARE WORD OPERANDS; RESULT IS A DOUBLEWORD 
MOV BX, VALUE1 7 BX=MULTIPLIER 
MOV CX, VALUE2 ; CX=MULTI PLICAND 
SUB AX, AX ;CLEAR AX 
MOV DX, AX 7CLEAR DX 
ADD AX,BX 7;ADD BX TO AX 
ADE DX, 00 7; ADD CARRIES IF THERE IS ONE 
BACK ; CONTINUE UNTIL CX=0 
RESULT, AX ; SAVE THE LOW WORD 
RESULT+2,DX ;SAVE THE HIGH WORD 


-MODEL SMALL 
- STACK 64 


RESULYL 
RESULT2 
RESULT3 


MAIN PROC 
MOV AX, @DATA 
MOV DS, AX 
Munmeeiy 2000,500, RESULTI 
MUBTIPLY 250000 RESULT2 
MULTIPLY 300,400,RESULT3 
MOV AH, 4CH 
INT Z1 7GO BACK TO DOS 
ENDP 
END MAIN 


Program 4-6 


Microsoft (R) Macro Assembler Version 5.10 1/13/92 00:33:14 
PROG4-6 


TITLE PROG4-6 
PAGE 60,132 


MULTIPLY MACRO VALUE1, VALUE2, RESULT 
LOCAL BACK 
, THIS MACRO COMPUTES RESULT = VALUEL X VALUEZ2 
27 BY REPEATED ADDITION 
7;VALUE1 AND VALUE2 ARE WORD OPERANDS; RESULT IS 
MOV BX, VALUE1L 7 BX=MULTIPLIER 
MOV CX, VALUE2 ; CX=MULTIPLICAND 
SUB Ax, AX ;CLEAR AX 
MOV DX,AX ;CLEAR DX 
ADD AX,BX ADD BX TO- AX 
ADC DX, OO PADD CARRIES IF THERE IS ONE 
LOOP BACK ;CONTINUE UNTIL CX=0 
MOV RESULT, AX ;SAVE THE LOW WORD 
MOV RESULT+2, DX 7 SAVE, THE GIGH WORD 
ENDM 


List File for Program 4-6 (continued on the next page) 
ee errr er eee eee 


CHAPTER 4: INT 21H AND INT 10H PROGRAMMING AND MACROS 153 


Microsore (R) Macro wASSenbillernVersHonmo Onn icy 7aCOr ee ar 
PROG4-6 Page 1-2 


-MODEL SMALL 
.STACK 64 


RESULT1 DW 2 DUP (0) 


RESULT2 DW 2 DUP (0) 


RESULT3 DW 2 DUP 


eee a ee ee Á iÁ {iÁ {Å Ė/ 


MAIN PROC FAR 
ASSUME CS:CDSEG, DO DTSEG, SS:STSEG 
MOV AX,@DATA 
MOV DS,AX 
MULTIPLY 2000,500,RESULT1 
MOV BX, 2000 ; BX=MULTIPLIER 
MOV CX,500 ; CXK=MULTIPLICAND 
SUB AX,AX ;CLEAR AX 
MOV DX,AX ;CLEAR DX 
ADD AX,BX ;ADD BX TO AX 
ADC D00 ;ADD CARRIES IF THERE 
LOOP ??0000 ;CONTINUE UNTIL CX=0 
0000 R 1 MOV RESULT1,AX ;SAVE THE LOW WORD 
ie 0002 R 1 MOV RESULT1+2,DX;SAVE THE HIGH WORD 
MULTIPLY 2500,500,RESULT2 
09C4 1 MOV BX,2500 ; BX=MULTIPLIER 
O1F4 1 MOV CX,500 ; CX=MULTIPLICAND 
CO il SUB AX,AX ;CLEAR AX 
C3 12?0001: ADD AX,BX ;ADD BX TO AX 
D2 00 1 ADE De 00 ;ADD CARRIES IF THERE 
F9 1 LOOP ??70001 ;CONTINUE UNTIL CX=0 
0004 Rl MOV  RESULT2,AX ;SAVE THE LOW WORD 
16 0006 R 1 MOV RESULT2+2,DX ;SAVE THE HIGH WORD 
MULTIPLY 300,400,RESULT3 
OLE MOV BX,300 ; BX=MULTIPLIER 
0190 MOV CX,400 ; CS=MULTIPLICAND 
CO SUB AX,AX ;CLEAR AX 
DO MOV DX,AX ;CLEAR DX 
C3 1?? 0002: ADD AX,BX PADD BX 10 AX 
D2 00 1 ADC De 00 ;ADD CARRIES IF THERE 
F9 i LOOP 270002 ; CONTINUE UNTIL 
0008 R 1 MOV RESULT3,AX ;SAVE THE LOW WORD 
16 000A R 1 MOV RESULT3+2,DX ;SAVE THE HIGH WORD 
AC MOV AH, 4CH 
2a INT 21H ;GO BACK TO DOS 
ENDP 
END MAIN 


List File for Program 4-6 (continued from the preceding page) 


eee 
154 


CLEARSCR MACRO ;the clear screen macro 
MOV AX,0600H 
MOV BHO? 
MOV Cx, 0000 
MOV DX, 184FH 


INT 10H 
ENDM 
DISPLAY MACRO STRING pene String display macro 


MOV AH,09 

MOV DX,OFFSET STRING 

INT 21H 

ENDM 
REGSAVE MACRO ;this macro saves all the registers 

PUSH AX 

PUSH BX 

BUSH TCX 

BUSH DX 

PUSH DI 

PUSH SI 

PUSE BP 

PUSHF 

ENDM 
REGRESTO MACRO ;this macro restores all the registers 

POPE 

HONE? Iie) 

POPSI 

POBI 

POPDX 

PORCX 

POPBX 

POPAX 

ENDM 

Assuming that these macros are saved on a disk under the filename 

"MYMACROI.MAC", the INCLUDE directive can be used to bring this file into any 
"asm" file and then the program can call upon any of the macros as many times as need- 
ed. When a file includes all macros, the macros are listed at the beginning of the ".Ist" file 
and as they are expanded, they will be part of the program. To understand this, see 
Program 4 -7. Program 4-7 includes macros to clear the screen, set the cursor, and display 
strings. These macros are all saved under the "MYMACRO2.MAC" filename. The ".asm" 
and ".lst" versions of the program that use the clear screen and display string macros only 
to display "This is a test of macro concepts" are shown on the following pages. 


PROG4-7 

GOES 2 

INCLUDE MYMACRO2.MAC 
"MODEL SMALL 

.STACK 64 


MESSAGE1 DB “This is a test of macro concepts','$' 
. CODE 

MAIN PROC FAR 
MOV AX, @DATA 
MOV DS,AX 
CLEARSCR ; INVOKE CLEAR SCREEN MACRO 
DISPLAY MESSAGE1 ;INVOKE DISPLAY MACRO 
MOV AH, 4CH 
INT 21 ;GO BACK TO DOS 
ENDP 
END MAIN 


Program 4-7 


a 


CHAPTER 4: INT 21H AND INT 10H PROGRAMMING AND MACROS 155 


Microsoft (R) Macro Assembler Version 5.10 1/13/92 00:41:49 
PROG4-7 


TITLE PROG4-7 
PAGE 60,132 
INCLUDE MYMACRO2 .MAC 
MYMACRO2 (MAC) FOR PROGRAM5-3 
CURSOR MACRO ROW, COLUMN 
THIS MACRO SETS THE CURSOR LOCATION AT ROW, COLUMN 
BROS IONE? ISOS; INGE ALONE MUINCIINOMN| OZ 
MOV AH, 02 SENT CURSORTEUNCTETON 
BH, 00 ; PAGE 00 
DH, ROW 7;ROW POSITION 
DL, COLUMN 7COLUMN POSITION 
7; INVOKE THE INTERRUPT 


DISPLAY MACRO STRING 

, THIS MACRO DISPLAYS A STRING OF DATA 

i DX = ADDRESS OF STRING. USES FUNCTION 09 INT 92in: 
MOV AH, 09 ;DISPLAY STRING FUNCTION 
MOV DX,OFFSET STRING ¿DX = OFFSET ADDRESS OF DATA 
INE Zale ; INVOKE THE INTERRUPT 


CLEARSCR MACRO 
; THIS MACRO CLEARS THE SCREEN 
7;USING OPTION 06 OF INT 10H 

MOV AX,0600H / SCROLL SCREEN FUNCTION 

MOV BH, 07 7NORMAL ATTRIBUTE 

ic MOV CX,0 ; FROM ROW=00,COLUMN=00 
DX, 184FH ;TO ROW=18H, COLUMN=4FH 
7; INVOKE THE INTERRUPT 


ic 
© 
C 
Q 
G 
C 
S 
€ 
© 
© 
C 
€ 
C 
C 
@ 
C 
C 
E 
© 
Cc 
Cc 
C 
C 
e 
e 


CeO Ore 


ee ee ee ee m 


0000 -MODEL SMALL 
0000 -STACK 64 
Microsoft (R) Macro Assembler Version 5.10 1/13/92 00:41:49 
PROG4-7 Page 1-2 


0000 . DATA 
0000 54 68 69 73 20 69 MESSAGE1 DB “This is a test of macro concepts','S' 
TE 20 61 20 7465 
q 74 20 6F 66 20 
OL GS V2 GIK 20 
GE 63 65 70 
24 


PROC FAR 
B8 0000s MOV AX, @DATA 

MOV DS, AX 

CLEARSCR ; INVOKE CLEAR SCREEN MACRO 
B8 0600 MOV AX, 0600H 


List File for Program 4-7 (continued on the next page) 


——_——$—$—$—$—$—$— asa 
156 


MOV BH,07 ; NORMAL ATTRIBUTE 

MOV CX,0 ; FROM ROW=00, COLUMN=00 
MOV DX,184FH ; TO ROW=18H, COLUMN=4FH 
INT 10H ; INVOKE THE INTERRUPT 
DISPLAY MESSAGE1 ;INVOKE DISPLAY MACRO 
MOV AH,09 ;DISPLAY STRING 


MOV DX,OFFSET MESSAGE1 ;DX =OFFSET ADDR 
Me 2H ; INVOKE THE INTERRUPT 
MOV AH, 4CH 

ONE Ziel ,GO BACK TO DOS 

ENDP 

END MAIN 


List File for Program 4-7 (continued from the preceding page) 


Notice that in the list file of Program 4-7, the letter "C" in front of the lines indi- 
cates that they are copied from another file and included in the present file. 


Review Questions 


Discuss the benefits of macro programming. 

List the three parts of a macro. 

Explain and contrast macro definition, invoking the macro, and expanding the macro. 
True or false. A label defined within a macro is automatically understood by the 
assembler to be local. 

5. True or false. In the list file for Program 4-7, the "C" at the beginning of a line indi- 
cates that it is a comment. 


gece ede 


PROBLEMS 


SECTION 4.1: BIOS INT 10H PROGRAMMING 


1. Write a program that: 
(a) Clears the screen, and (b) sets the cursor position at row = 5 and column = 12. 
2. What is the function of the following program? 
MOV AH,02 
MOV BH,00 
MOV DL, 20 
MOV DH,10 
INT 10H 
3. The following program is meant to set the cursor at position row = 14 and column = 20. 
Fix the error and run the program to verify your solution. 
MOV AH,02 
MOV BH,00 
MOV DH,14H 
MOV DL,20H 
INT 10H 


4. Write a program that sets the cursor at row = 12, column = 15, then use the code below 
to get the current cursor position in register DX with DH = row and DL = column. Is 
the cursor position in DH and DL in decimal or hex? Verify your answer. 


MOV AH,03 
MOV BH, 00 
INT 10H 


5. Inclearing the screen, does the sequence of code prior to INT 10H matter? In setting 
a cursor position? Verify by rearranging and executing the instructions. 


eee ee ———————EEE——=&= 


CHAPTER 4: INT 21H AND INT 10H PROGRAMMING AND MACROS 157 


6. You want to clear the screen using the following program, but there are some errors. 
Fix the errors and run the program to verify it. 
MOV AxX,0600H 
MOV BH,O7 
MOV CX,0000 
MOV DX,184F 
JONG LOR 
7. Write a program that: 
(a) Clears the screen 
(b) Sets the cursor at row = 8 and column = 14 
(c) Displays the string "IBM Personal Computer" 


SECTION 4.2: DOS INTERRUPT 21H 


8. Run the following program and dump the memory to verify the contents of memory 
locations 0220H to 022FH if "IBM PC with 8088 CPU" is keyed in. 


ORG 220H 
BUFFER DB iy, IG DUI (OF) 


and for the code: 
MOV AH, OAH 
MOV DX,OFFSET BUFFER 
INT 21H 


9. Write a program that: 
(a) Clears the screen. 
(b) Puts the cursor on position row = 15 and column = 20. 
(c) Displays the prompt "What is your name?" 
(d) Gets a response from the keyboard and displays it at row = 17 and column = 20. 
10. Write a program that sets the mode to medium resolution, draws a vertical line in the 
middle of the screen, then draws a horizontal line across the middle of the screen. 
11. Write a program to input a social security number in the form 123-45-6789 and trans- 
fer it to another area with the hyphens removed, as in 123456789. Use the following 
data definition. 


SS_AREA LABEL BYTE 
SS_SIZE DB 12 
SS_ACTUAL DB 2 

SS DASHED DB 12 DUP (?) 
SS_NUM DB 9 DUP (?) 


12. Write a program (use the simplified segment definition) to input two seven-digit num- 
bers in response to the prompts "Enter the first number" and "Enter the second num- 
ber". Add them together and display the sum with the message "The total sum is: ". 


SECTION 4.3: WHAT IS A MACRO AND HOW IS IT USED? 
13. Every macro must start with directive and end with directive 


14. Identify the name, body, and dummy argument in the following macro: 
WORK HOUR MACRO OVRTME HR 


MOV AL,40;WEEKLY HRS 
ADD AL,OVRTME HR ; TOTAL HRS WORKED 
ENDM 


15. Explain the difference between the SALL, .LALL, and XALL directives. 
16. What is the total value in registers DX and AX after invoking the following macro? 
The macro is invoked as: WAGES 60000,25000,3000 
WAGES MACRO SALARY,OVERTIME, BONUSES 
; TOTAL WAGES=SALARY + OVERTIME + BONUSES 
SUB AX,AX ; CLEAR 


eee 
158 


17. 


18. 


Oe 


20. 


2A 


22 


23. 


24. 


MOV DX,AX 7AX AND DX 
ADD AX, SALARY 
ADD AX,OVERTIME 


INOS DO ; TAKE CARE OF CARRY 
ADD AX, BONUSES 

ANDIC [DEK 

ENDM 


In Problem 16, in the body of the macro, dummies were used as they are listed from 
left to right. Can they be used in any order? Rewrite the body (leave the dummies 
alone) by adding OVERTIME first. 
In Problem 16, state the comments that are listed if the macro is expanded as: 

. LALL 

WAGES X,Y,Z 
Macros can use registers as dummies. Show the ".Ist" file and explain what the macro 
in Problem 16 does if it is invoked as follows: 

WAGES BX,CX,SI 
Fill in the blanks for the following macro to add an array of bytes. Some blanks might 
not need to be filled. 

SUMMING MACROCOUNT, VALUES 

VOGAL AT 89 

;;this macro adds an array of byte size elements. 

77axX will hold the total sum 


MOV CX oem ;size of array 

MOV oOo... .7l0ad offset address of famray 

SUB AX, AX ¿clear AX 

AGAIN: ADD  AL,[ SI] 

ADC AH, 0 ;add bytes and takes care of carries 
INC SI ;point to next byte 

LOOP AGAIN ;continue until finished 

ENDMA eae 


Invoke and run the macro above for the following data. 
;In the data segment 
DATA1 DB Od. Sp 3, Sal aS 
SUM1 DW ? 
DATA2 DB 357 O89), D9, Is sip D5 D4 pes 
SUM2 DW ? 
DATA3 DB 10 DUP (99) 
SUM3 DW ? 
(Hint: For the format, see Problem 20.) 
Insert the listing directives in Problem 21 as follows and analyze the ".Ist" file. 
From the code segment: 


. LALL 

SUMMING 5, DATAL ;adding and saving datal MOVSUM1, AX 
. XALL 

SING srssssas ;adding and saving data2 

. SALL ;adding and saving data3 


. ..o s 


Rewrite Problem 20 to have a third dummy argument for SUM. Then rework 


Problem 19. 
Rewrite Program 4-6 using the DD directive for RESULT1, RESULT2, and 


RESULT3. 


Nee _..._ | aca, 
CHAPTER 4: INT 21H AND INT 10H PROGRAMMING AND MACROS 159 


ANSWERS TO REVIEW QUESTIONS 


SECTION 4.1: BIOS INT 10H PROGRAMMING 


il 
2. 
3 


OS AS 


Perform screen I/O 
80, 25; 00,00 and 24,79 


MOV AH,06 7;SELECT CLEAR SCREEN FUNCTION 
MOV AL, 00 ;AH=0 TO SCROLL ENTIRE PAGE 
MOV BH, 07 ;BH=07 FOR NORMAL ATTRIBUTE 
MOV NCH OO 7; START AT ROW 00 

MOV CL,00 ;START AT COLUMN OO 

MOV DH, 24 END AT ROW 24 

MOV DL, 79 END AT ROW 79 

ONAE AEOS 7; INVOKE THE INTERRUPT 


Indicates that the cursor is at row 5, column 34 

It provides information about the foreground and background intensity, whether the 
foreground is blinking and/or highlighted. - 
Character 

10100111 

The first time INT 10H is invoked, it sets the cursor to position 00,00; the second time 
it is invoked, it displays the character '*' 80 times with attributes of white on black, 
blinking. 


SECTION 4.2: DOS INTERRUPT 21H 


1 
2 
Bi 
4. 
5. 
6 
7 


10H, 21H 

The rightmost code inputs a string from the keyboard into a buffer; the code on the 
left outputs a string from a buffer to the monitor. 

The end of the string is the dollar sign '$'. 

0AH, OFFSET MESSAGE1 

When the '$' within the string is encountered, the computer will stop displaying the 
string. 

BELL EQU 07H 

Using the EQU in Answer 6, the code segment would include the following: 

MOV AH, 02 

MOV DL, BELL 


INT 21H 

SS_AREA LABEL BYTE 

Sie STAR DB 12 
SS_ACTUAL DB ? 

SS_NUM DB 12 DUP (2) 


SECTION 4.3: WHAT IS A MACRO AND HOW IS IT USED? 


Il 


160 


Macro programming can save the programmer time by allowing a set of frequently 

repeated instructions to be invoked within the program by a single line. This can also 

make the code easier to read. 

The three parts of a macro are the MACRO directive, the body, and the ENDM direc- 

tive. 

The macro definition is the list of statements the macro will perform. It begins with 

the MACRO directive and ends with the ENDM directive. Invoking the macro is 

when the macro is called from within an assembly language program. Expanding the 

macro is when the assembly replaces the line invoking the macro with the Assembly 

language code in the body of the macro. 

a label that is to be local to a macro must be declared local with the LOCAL 
irective. 

False. The "C" at the beginning of a line indicates that this line of code was brought 

in from another file by the INCLUDE directive. 


CHAPTER 5 


KEYBOARD AND MOUSE 
PROGRAMMING 


OBJECTIVES 
Upon completion of this chapter, you will be able to: 


>> Code Assembly language instructions using INT 16H to get and check 
the keyboard input buffer and status bytes 

>> Code Assembly language instructions for key press and detection 

>> Use INT 33H to control mouse functions in text and graphics modes 

>> Code Assembly language instructions to initialize the mouse 

>> Code Assembly language instructions to set or get the mouse cursor 
position 

>> Use INT 33H functions to retrieve mouse button press or release 
information 


>> Limit mouse cursor postions by setting boundaries or defining 
exclusion areas 


161 


This chapter explores keyboard and mouse programming in x86 PC. In Section 
5.1, we utilize INT 16H function calls to access keyboard input. Then in Section 5.2, we 
demonstrate the use of INT 33H to control mouse functions. 


SECTION 5.1: INT 16H KEYBOARD PROGRAMMING 


In this section we first look at the keyboard used in IBM PC and compatible com- 
puters. Next we examine the scan codes that uniquely identify each key on the keyboard 
so that the computer can determine which key was pressed. Then the use of INT 16H is 
described for checking if a key has been pressed, identifying which key is pressed, and 
other functions. 


IBM PC/XT keyboard 
The original IBM PC keyboard had 83 keys, arranged in three major groupings: 
1. The standard typewriter keys 


2. Ten function keys, F1 to F10 
3. 15-key keypad 


These 83 keys are shown in Table 5-1. In later years, enhanced keyboards have 
become popular which feature 101 keys. 


IBM PC scan codes 


In IBM PC and compatible computers, each key on the keyboard is associated 
with a scan code. Tables 5-1, 5-2, and 5-3 provide the scan codes for both the original PC 
and enhanced keyboards. 


Table 5-1: PC Scan for 83 PC Keys 


Hex Key Hex Key Hex Key Hex Key 


0l Esc ji I andi 2D X and x 43 F9 

02 ! and 1 18 O and o 2E C and c 44 F10 

03 and 2 19 P and 2F V and v 45 Num Lock 
04 # and 3 1A and 30 B and b 46 Scroll Lock 
05 $ and 4 1B } and ] Sil N and n 47 7 and Home 
06 % and 5 1C enter 3P M and m 48 8 and 

07 ^and 6 1D ctrl 33 <and, 49 9 and PgU 
08 & and 7 1E A and a 34 >and. 4A - (gra 

09 * and 8 IF S and s 35 ? and / 4B 4 and <- 
0A and 9 20 D and d 36 Right Shift 4C 5 (keypad 
0B and 0 21 F and f ov PrtSc and * 4D 6 and > 
0C and - p2 G and g 38 Alt 4E + (gray) 

0D + and = 23 H and h 39 space bar 4F 1 and End 
0E backspace 24 Jandj 3A Caps Lock 50 2 and 4 

OF tab 25 K and k 3B F1 Si 3 and PgDn 
10 Q and q 26 L and | 3C F2 52 0 and Ins 
J1 W and w 2 : and ; 3D F3 53 . and Del 
12 E ande 28 “and ‘ 3E F4 

13 R andr 29 ~and ` 3F F5 

14 Tandt 2A Left Shift 40 F6 

15 Y and y 2B and \ 41 F7 

16 U and u 2C Z and z 42 F8 


(Reprinted by permission from “IBM BIOS Technical Reference” c. 1987 by International Business Machines Corporation) 


eee 
162 


In Table 5-1, notice that the same scan code is used for a given lowercase letter 
and its capital. The same is true for all the keys with dual labels. If the scan code is the 
same for both of them, how does the system distinguish between them? This is taken care 
of by the keyboard shift status byte. Some INT 16H function calls provide the status byte 
in AL, as we will see in later examples. The meaning of each bit of the keyboard status 
byte is given in Table 5-4. Notice that some of the bits are used for the 101-key enhanced 
keyboards. 


Table 5-2: Combination Key Scan Codes_ 


54 Shift F1 60 Ctrl F3 Alt W Ati 
55 Shift F2 61 Ctrl F4 6D __ Alt F6 79 Alt 2 
56 Shift F3 62 Ctrl F5 6E_—AIt F7 7A Alt3 
57 Shift F4 63 Ctrl F6 6F  AltF8 7B  Alt4 
58 Shift F5 64 Ctrl F7 70 Alt F9 7c Ats 
59 Shift F6 65 Ctrl F8 71 Alt F10 7D  Alt6 
5A __ Shift F7 66 Ctrl F9 2 Ctrl PrtSc 7E At 
5B _ Shift F8 67 Ctrl F10 B Ctrl — TF Alt8 
5C Shift F9 68 Alt Fl 74 Ctrl > 80 Alt 9 
5D __ Shift F10 69 Alt F2 75 Ctrl End 81 Alt 10 
5E Ctrl Fl 6A AIt F3 76 Ctrl PgDn 

5F Ctrl F2 6B  AltF4 77 Ctrl Home 


(Reprinted by permission from “IBM BIOS Technical Reference” c. 1987 by International Business Machines Corporation) 


Table 5-3: Extended Keyboard Scan Codes 


Hex Ke Hex Ke Hex Ke Hex Ke 
85 F11 8E Ctrl - 97 Alt Home AO a 


86 KIZ 8F Ctrl 5 98 Alt Al __ Alt PgDn 
87 Shift F11 90 Girl + 99 Alt PgUp A2 Alt Ins 
88 Shift F12 on Ctrl 9A A3__ Alt Del 
89 Ctrl Fll 9? Ctrl Ins 9B Alt <— A4 Alt/ 

8A Gil F 12 93 Ctrl Del 9C A5 __ Alt Tab 
8B Alt Fil 94 Ctrl Tab 9D Alt > A6 Alt Enter 
8C Alt F12 95 Ctrl / OE 

8D Ctrl 96 CH oF Alt End 


(Reprinted by permission from “IBM BIOS Technical Reference” c. 1987 by International Business Machines Corporation) 


; Table 5-4: Keyboard Status Byte 
When a key is pressed, Be OB) a 
the operating system stores its Bit If=1 Mask Code (OR 


sean caderi memor locatis 9” Right chifipressed FEA 

called a keyboard buffer, locat- —___ eee 

ed in the BIOS data area. To ee PDU o aea 
; 2 Ctrl pressed FBH 

relieve programmers from the m 4 — 

details of keyboard interaction 3__ Alt pressed FE 

with the motherboard, IBM has 4 Scroll Lock toggled ___EFH 

provided INT 16H. Next we 5 NumLock toggled DFH 

look at the services provided by 6 CapsLock toggled BFH 


the BIOS INT 16H. 7 Ins toggled 7FH 


Ua 


CHAPTER 5: KEYBOARD AND MOUSE PROGRAMMING 16 


Checking a key press 


Chapter 4 demonstrated the use of INT 21H function AH = 07, which waits for 
the user to input a character. What if a program must run a certain task continuously while 
checking for a keypress? Such cases require the use of INT 16H, a BIOS interrupt used 
exclusively for the keyboard. To check a key press we use INT 16H function AH = 01. 

MOV AH, 01 ;check for key press 

INT 16H FUSTA IONE IL Gls! 


Upon return, ZF = 0 if there is a key press; ZF = 1 if there is no key press. Notice 
that this function does not wait for the user to press a key. It simply checks to see if there 
is a key press. If a character is available, it returns its scan code in AH and its ASCII code 
in AL. The use of this function is best understood in the context of examples. Program 5- 
1 sends the ASCII bell character, 07 hex (see Appendix F), to the screen continuously. To 
stop the bell sound, the user must press any key. 


TITLE PROGRAM 5-1:KEYBOARD HIT USING INT 16H 
7; THIS PROGRAM SOUNDS THE BELL CONTINUOUSLY UNLESS ANY KEY IS PRESSED 
-MODEL SMALL 


. STACK 
. DATA 

MESSAGE DB 'TO STOP THE BELL SOUND PRESS ANY KEYS' 
ACODE 

MAIN PROC 


MOV AX, @DATA 

MOV DS, AX 

MOV AH, 09 

MOV DX,OFFSET MESSAGE ;DISPLAY THE MESSAGE 


INT 2H 
AGAIN: MOV AH,02 ; SENDING TO MONITOR A SINGLE CHAR 

MOV Diy, 07 ; SEND OUT THE BELL CHAR 

INT Z ALE 

MOV AH, 01 CHECK THE KEY PRESS 

INT 16H ;USING INT 16H 

JZ AGAIN ;IF NO KEY PRESS STAY IN THE LOOP 

MOV AH, 4CH ; IF ANY KEY PRESSED GO BACK TO DOS 
Zin 


Program 5-1 


Which key is pressed? 


There are times when the program needs to know not only if a key has been 
pressed but also which key was pressed. To do that, INT 16H function AH = 0 can be used 
immediately after the call to INT 16H function AH = 01. 


MOV AH, 0 ;get key pressed 
INT 16H TWS NU Ills! 


Upon return, AL contains the ASCII character of the pressed key; its scan key is 
in AH. Notice that this function must be used immediately after calling INT 16H func- 
tion AH = 01. Program 5-2 demonstrates how it works. 

INT 16H function AH = 0 can also be used by itself to get a character from the 
keyboard. The difference between these two functions is that AH = 1 comes back whether 
or not a key has been pressed whereas AH = 0 doesn’t return until a key is pressed. For 
characters such as F1-F10 for which there is no ASCII code, it simply provides the scan 
code in AH and AL = 0. Therefore, if AL = 0, a special function key was pressed. This 
option simply provides the code for the character and does not display it. 


eee 
164 


TITLE PROGRAM 5-2: MODIFIED VERSION OF PROGRAM 5-1 
7THIS PROGRAM SOUNDS THE BELL CONTINUOUSLY UNTIL 'Q' OR ‘a IS “PRESSED 
-MODEL SMALL 


. STACK 
. DATA 

MESSAGE DB 'TO STOP THE BELL SOUND PRESS Q (or q) KEYS' 
CODE 

MAIN PROC 


MOV AX, @DATA 

MOV DS, AX 

MOV An 09 

MOV DX,OFFSET MESSAGE ;DISPLAY THE MESSAGE 


INT Zale 
AGAIN: MOV AH, 02 
MOV DEP (0) 7 ; SOUND THE BELL BY SENDING OUT BELL CHAR 
INT Z ASI 
MOV AH, 01 7CHECK FOR KEY PRESS 
INT 16H ;USING INT 16H 
JZ AGAIN ;IF NO KEY PRESS KEEP SOUNDING THE BELL 
MOV AH, 0 7TO GET THE CHARACTER 
INT 16H ;WE MUST USE INT 16H ONE MORE TIME 
CMP Ar O! PIES 1 HOA 
JE EXIT PIU dS I 
CMP AL, 'q' PIES) ART AS 
JE EXIT -ENES REXTI 
JMP AGAIN 7NO. KEEP SOUNDING THE BELL 
EXT MON AH, 4CH COME ACK= Oe DOS 
INT 208 
MAIN ENDP 


END 


Program 5-2 


Other INT 16H functions 


Due to additional keys on the IBM extended keyboard, BIOS added the follow- 
ing additional services to INT 16H. 


INT 16H, AH = 10H (read a character) 


This is the same as AH = 0 except that it also accepts the additional keys on the 
IBM extended (enhanced) keyboard. 


INT 16H, AH = 11H (find if a character is available) 


This is the same as AH = | except that it also accepts the additional keys on the 
IBM extended (enhanced) keyboard. 


Review Questions 


Which function of INT 16H is used for key press detection? 

In the above question, how do you know if a key is pressed? 

In the above question, how can the ASCII value for the pressed key be obtained? 
Indicate the main difference between INT 21H function AH = 07, and INT 16H func- 
tion AH = 01. 

5. Write a simple program to sound the bell unless the letter 'X' is pressed. If "X" is 
pressed, the program should exit. 


u a 


e a. 
CHAPTER 5: KEYBOARD AND MOUSE PROGRAMMING 165 


SECTION 5.2: MOUSE PROGRAMMING WITH INT 33H 


Next to the keyboard, the mouse is one of the most widely used input devices. 
This section describes how to use INT 33H to add mouse capabilities to programs. 


INT 33H 


The original IBM PC and DOS did not provide support for the mouse. For this 
reason, mouse interrupt INT 33H is not part of BIOS or DOS. This is in contrast to INT 
21H and INT 10H, which are the DOS and BIOS interrupts, respectively. Now, INT 33H 
is part of the mouse driver software that is installed when the PC is booted. 


Detecting the presence of a mouse 


While new PCs come with a mouse and driver already installed by the PC manu- 
facturer, many older-generation PCs in use do not have a mouse. Therefore, the first task 
of any INT 33H program should be to verify the presence of a mouse and the number of 
buttons it supports. This is the purpose of INT 33H function AX = 0. Upon return from 
INT 33H, if AX = 0 then no mouse is supported. If AX = FFFFH, the mouse is supported 
and the number of mouse buttons will be contained in register BX. Although most mice 
have two buttons, right and left, there are some with middle buttons as well. See the fol- 
lowing code. 


MOV AX, 0 ;mouse initialization option 

INT Sil 

CMP Ax, 0 ¿check AX contents after INT 33H 

JE EXET ;exit if AX=0 since no mouse available 


MOV M BUTTON, BX ;mouse is there, save number of buttons 


E e 
Notice the following points about the way INT 33H is called. 


1. In INT 21H and INT 10H, the AH register is used to select the functions. This is not 
the case in INT 33H. In INT 33H the register AL is used to select various functions 
and AH is set to 0. That is the reason behind the instruction "MoV Ax, 0". 

2. Do not forget the "H" indicating hex in coding INT 33H. In the absence of the "H", 
the compiler assumes it is decimal and will execute DOS INT 21H since 33 decimal 
is equal to 21H. 


Some mouse terminology 


Before further discussion of INT 33H, some terminology concerning the mouse 
needs to be clarified. The mouse pointer (or cursor) is the pointer on the screen indicat- 
ing where the mouse is pointing at a given time. In graphics mode, the mouse pointer (cur- 
sor) is an arrow; in text mode, the mouse pointer is a flashing block. In either mode, as the 
mouse is moved, the mouse cursor is also moved. While the movement of the mouse is 
measured in inches (or centimeters), the movement of the mouse cursor (arrowhead) on 
the screen is measured in units called mickeys. Mickey units indicate mouse sensitivity. 
For example, a mouse that can move the cursor 200 units for every inch of mouse move- 
ment has a sensitivity of 200 mickeys. In this case, one mickey represents 1/200 of an inch 
on the screen. Some mice have a sensitivity of 400 mickeys in contrast to the commonly 
used 200 mickeys. In that case, for every inch of mouse movement, the mouse cursor 
moves 400 mickeys. 


Displaying and hiding the mouse cursor 


The AX = 01 function of INT 33H is used to display the mouse cursor. 


MOV AX,01 
INT 33H 


aaae 
166 


After executing the above code, the mouse pointer is displayed. If the video mode 
is graphics, the mouse arrow becomes visible. If the video mode is text, a rectangular 
block representing the mouse cursor becomes visible. In text mode, the color of the mouse 
cursor block is the opposite of the background color in order to be visible. It is best to hide 
the mouse cursor after making it visible by executing option AX = 02 of INT 33H. This 
is shown in Example 5-1. Try Example 5-1 in DEBUG (remember to omit the "H" and 
place INT 3 as the last instruction when in DEBUG). Then try it with mode AH = 03 for 
INT 10H to see the mouse cursor in text mode. 


Video resolution vs. mouse resolution in text mode 


As discussed in Chapter 4, the video screen is divided into 640 x 200 pixels in 
text mode. This means that in text mode of 80 x 25 characters, each character will use 8 
x 8 pixels (80 x 8 = 640 and 25 x 8 = 200). When the video mode is set to text mode (AH 
= 03 of INT 10H), the mouse will automatically adopt the same resolution of 640 x 200 
for its horizontal and vertical coordinates. Therefore, in text mode when a program gets 
the mouse cursor position, the values are provided in pixels and must be divided by 8 to 
get the mouse cursor position in terms of character locations 0 to 79 (horizontal) and 0 to 
24 (vertical) on the screen. 


Video resolution vs. mouse resolution in graphics mode 


In graphics, resolution is not only 640 x 200 but also 640 x 350 and 640 x 480. 
When the video resolution is changed to these video modes, the mouse also adopts the 
graphics resolutions. See Table 5-5. 
Table 5-5: Video and Mouse Resolution for Some Video Modes 


Video Mode __Video Resolution Type Mouse Resolution Characters per Screen 


AE —U3 640 x 200 Text 640 x 200 80 x 25 
AL = 0EH 640 x 200 Graphics _ 640 x 200 80 x 25 
AL —0FR 640 x 350 Graphics 640 x 350 80 x 44 
AL = 10H 640 x 350 Graphics _ 640 x 350 80 x 44 
ALe 11H 640 x 480 Graphics _ 640 x 480 80 x 60 
AL = 12H 640 x 480 Graphics 640 x 480 80 x 60 


Example 5-1 


(a) Use INT 10H option OF to get the current video mode and save it in BL; 
(b) set the video mode to VGA graphics using option AH = 10H of INT 10H; 
(c) initialize the mouse with AX = 0, INT 33H; (d) make the mouse visible; 
(e) use INT 21H option AH = 01 to wait for key press; 
(f) if any key is pressed restore the original video mode. 
Solution: 
MOV AH, OFH ;get the current video mode 
INT 10H 
MOV BL, AL zand save it 
MOV AH, 0 ;set the video mode 
MOV AL, 10H ;to VGA graphics 
INT 10H 
MOV AX, 0 ;initialize the mouse 
INT 33H 
MOV AX, 01 ;make the mouse cursor visible 
INT SoH 
MOV AH, 01 ;wait for key press 
INT 23MH 
MOV JK oc ;when any key is pressed 
INT 33H ;make mouse invisible 
MOV AH, 0 
MOV AL, BL ;and restore original video mode 
INT 10H 


ee ee eee eee eee eee nnn Ww oEr_—S————=z= 
CHAPTER 5: KEYBOARD AND MOUSE PROGRAMMING 167 


Getting the current mouse cursor position (AX = 03) 


Option AX = 03 of INT 33H gets the current position of the mouse cursor. Upon 
return, the X and Y coordinates are in registers CX (horizontal) and DX (vertical). BX 
contains the button status as follows: DO = left button status, D1 = right button status, D2 
= center button status. The status is 1 if down, 0 if up. Notice that the cursor position pro- 
vided by this function is given in pixels. For example, the position returned will be in the 
range of 0-639 (horizontal) and 0-199 (vertical) for a 640 x 200 screen in most text and 
graphics video modes. However, the mouse cursor position is often needed in terms of 
character positions such as 80 x 25 and not in terms of pixels. To get the mouse cursor 
character position, divide both the horizontal and vertical values of CX and DX by 8. See 
Programs 5-3 and 5-4. 


TITLE PROGRAM 5-3: DISPLAYING MOUSE POSITION 
;Performs the following tasks: (a) gets the current video mode and 
;saves it, (b) sets the mode to a new video mode, (c) gets the mouse 
;pointer position, converts it to character position and displames it 
;continuously unless a key is pressed, (d) upon pressing any key, it 
;restores the original video mode and exits to DOS. 
CURSOR MACRO ROW, COLUMN 

MOV AH, 02H 

MOV BH, 00 

MOV DH, ROW 

MOV DL, COLUMN 

INT 10H 

ENDM 
DISPLAY MACRO STRING 

MOV AH, 09H 

MOV DX OFFSET STRING ;load string address 

INT 21H 

ENDM 


-MODEL SMALL 

. STACK 

. DATA 

BB “PRESS ANY KEY TO (GET OUT" Ss" 

DB "THE MOUSE (CURSOR CS LOCATED AT Y "3" 
IDS Bee, “ AND $4 


IDI): Bopp. Vis 
¿current video mode 
;new video mode 


AX, @DATA 

DS, AX 

AH, OFH ;get current video mode 
10H 

OLDVIDEO, AL ;save it 

AX, 0600H ;clear screen 

BATON 

CX 0 

DX, 184FH 

10H 

AH, OOH ;set new video mode 
AL, NEWVIDEO 

10H 

AX, 0 A aissicsellipas mouse 
22E 


MOUSE ebhic soe 


Program 5-3 (continued on the next page) 


eee 
168 


CURSOR 20720 

DISPLAY MESSAGE 1 

: MOV AX,03H get mouse location 

INT 33H 

MOV AX, CX (Geb whe hor. pixel positaon 
CALL CONVERT ;convert to displayable data 
MOV POS _HO,AL ;save the LSD 

MOV POS HO+1,AH ¡save the MSD 

MOV AX, DX Ppgeu ene vert. pixel position 
CALL CONVERT eOnmerate 

MOV POS_VE,AL ;save 

MOV POS _VE+1, AH 

GURSORS 5720 

DISPLAY MESSAGE 2 ; 


DISPLAY POS HO 
BISPLAY POS WE 


MOV AH, 01 ;check for key press 
INT 16H 


JZ AGAIN ;if no key press, keep monitoring mouse position 
MOV AH, 02 ;hide mouse 

INT SBH 

MOV AH, 0 ;restore original video mode 

MOV AL,OLDVIDEO ;load original video mode 

INT 10H 

MOV AH, 4CH 790 back to DOS 

INT BAB 

ENDP 

END 


Programi 5-3 (continued from the previous page) 
Setting the mouse pointer position (AX = 04) 


This function allows a program to set the mouse pointer to a new location any- 
where on the screen. Before calling this function, the coordinates for the new location 
must be placed in registers CX for the horizontal (x coordinate) and DX for the vertical (y 
coordinate). These values must be in pixels in the range of 0-639 and 0-199 for 640 x 200 
resolution. Coordinate (0,0) is the upper left corner of the screen. For example, to set the 
mouse cursor at location 9 x 5 (on a 80 x 25 screen), simply multiply both by 8 to get the 
pixel location. Therefore, a character coordinate of 9 x 5 becomes 72 x 40 in pixel coor- 
dinates. 


Getting mouse button press information (AX = 05) 


This function is used to get information about specific button presses since the last 
call to this function. It is set up as follows. 


AX = 05 
BX = 0 for left button; 1 for right button; 2 for center button 
Upon return: 


AX = button status where 

DO = Left button, if 1 it is down and if 0 it is up 

D1 = Right button, if 1 it is down and if 0 it is up 

D2 = Center button, if 1 it is down and if 0 it is up 

BX = button press count 

CX = x-coordinate at the last button press in pixels (horizontal) 
DX = y-coordinate at the last button press in pixels (vertical) 


Notice in function AX = 05 that upon returning from INT 33H, register AX has the 
button status (up or down), while register BX has the number of times the specific button is 
pressed since the last call to this function. Program 5-4 shows one way to use this function. 


ener eee ————————EE————_==z 


CHAPTER 5: KEYBOARD AND MOUSE PROGRAMMING 169 


TITLE PROGRAM 5-4:MOUSE BOX PROGRAM 
;Performs the following: 
; (a) gets the current video mode and saves it, 
; (b) sets the video mode to a new one and clears screen, 
; (c) draws a colored box and gets the mouse position, 
(d) displays different messages depending on whether the mouse is 


fe 


; clicked inside or outside the box Pressing any key will return to DOS. 
;Thanks to Travis Erck and Gary Hudson for their Inputeom tov weroge 


CURSOR MACRO ROW, COLUMN 
MOV AH,02H 
MOV BH,00 
MOV DH, ROW 
MOV DL, COLUMN 
INT 10H 
ENDM 
MACRO STRING 
MOV AH, 09H 
MOV DX,OFFSET STRING ;load string address 
INT 2 Ais 
ENDM 
MACRO ROW START,COL START,ROW_END,COL END, COLOR 
LOCAL START, AGAIN 
MOV DX,ROW START 
MOV CX,COL START 
MOV AH, OCH 
MOV AL, COLOR 
INT 10H 
INC (ex 
CMP CX, COL END 
JNE AGAIN 
INC DX 
CMP DX, ROW_END 
JNE START 
ENDM 


.MODEL SMALL 

STACK 

. DATA 

'AN EXAMPLE OF HOW TO USE INTERRUPT 33H FOR MOUSE.','S' 
'IT WORKS!','S! 

"CLICK IN THE BOX TO SEE WHAT HAPPENS!','S! 

'No, NO, NO I SAID IN THE BOX!','S! 

"NOW PRESS ANY KEY TO GET BACK TO DOS. $'! 

Ge) 


12H 
. CODE 


AX, @DATA 

DS, AX 

AH, OFH 7,get the current video mode 
10H 

OLDVIDEO; Al ; save it 


Program 5-4 (continue on the next page) 


eee 
170 


MOV AX, 0600H 
MOV BETROF 


MOV Ox 
MOV DX, 184FH 
INT 10H 


MOV AH, 00H 

MOV AL, NEWVIDEO 
INT 10H 

CURSOR 0,0 


ewe 150,250, 250535074 


CURSOR 1,1 


DISPLAY MESSAGE 1; 


CURSOR 5, 22 


DISPLAY MESSAGE 3; 


MOV AX, 0000H 


INT 33H 
MOV AX,01H 
INT 33H 
BACK: MOV AX,03H 
INT 33H 
CMP BX,0001H 
JNE BACK 
CMP CX, 250 
JB NOT_INSIDE 


CMP CX,350 

JA NOT _JTNSIDE 
CMP DX, 150 

JB NOT INSIDE 
CMP DX, 250 

JA NOT INSIDE 
CURSOR 18,18 
DISPLAY MESSAGE 2 


JMP EXIT 
NOT_INSIDE: 

CURSOR 20,18 

DISPLAY MESSAGE 4 
ERIT: MOV AH, 02H 

INT 33H 

CURSOR 22,18 

DISPLAY MESSAGE_5 

MOV AH,07 

INT 21H 

MOV AH,0 

MOV AL,OLDVIDEO 

im 20H 

MOV AH, 4CH 

INT 21H 
MAIN ENDP 

END 


Program 5-4 (continued from the previous page) 


¿clear screen 


¿set new video mode 


¿draw red box 


j¿1nitlallze mouse 


;show mouse cursor 
;check for mouse button press 
;now CX =COL and DX=ROW location 


;check to see if left button is pressed 


;If not keep checking 

;see if on right side of box 
;if less it must be outside box 

;see if on left side of box 
;if not then it is outside the box 

;check for the top of the box 
;if not then outside the box 

;see if bottom of the box 


;then it must be inside box 
;indicate mouse is inside the box 
7;gO prepare to exit to DOS 


;indicate mouse is not inside box 


;hide mouse before exiting to DOS 


;wait for a key press 


;restore original video mode 


exit 
ACORDOS 


pO 


CHAPTER 5: KEYBOARD AND MOUSE PROGRAMMING 171 


Monitoring and displaying the button press count program 


Program 5-5 uses the AX = 05 function to monitor the number of times the left 
button is pressed and then displays the count. It prompts the user to press the left button 
a number of times. When the user is ready to see how many times the button was pressed, 
any key can be pressed. 


ME IIE ILE PROGRAM 5-5: DISPLAY MOUSE PRESs COUNT 

;THIS PROGRAM WAITS FOR THE MOUSE PRESS COUNT AND DISPLAYS IT WHEN 
PANY KEY 15 PRESSED. 

pentose ANYSKEY TO (GOs BACK TORDOS 

PAGE 10), SZ 


CURSOR MACRO ROW, COLUMN 
MOV AH,02H 
MOV BH,00 
MOV DH, ROW 
MOV  DL,COLUMN 
INT 10H 
ENDM 


MACRO STRING 

MOV AH,09H 

MOV DX,OFFSET STRING ;LOAD STRING ADDRESS 
INT 21H 

ENDM 


-MODEL SMALL 

OPAC 

a TATJA 

DB 'PRESS LEFT BUTTON A NUMBER OF TIMES:LESS THAN 99.','S! 


DB "lO FIND OUT HOW MANY TIMES, PRESS ANY KEY") "6" 
Dies YMOW Ia SisiiD) Wor YU Sy 

DE Bee, T UMass SY 

DB 'NOW PRESS ANY KEY TO GO BACK TO DOSY 5 Vis! 

DB ? ;Ccurrent video mode 

IDS; iL Ziel ;new video mode 

(SO IDIa; 


PROC 

MOV AX, @DATA 

bs Ax 

AH, OFH ¿get current video mode 
10H 

OLDVIDEO, AL ;save it 

AX, 0600H ;clear screen 

BH, 07 

ie, C 

DX, 184FH 

10H 

AH, OOH Set new video mode 
AL, NEWVIDEO 

10H 

AX, 0 ;initialize mouse 
SOH 


Program 5-5 (continued on the next page) 


eee 
172 


MOV AX, 01 ;show mouse cursor 
INT 33H 


CURSOR 2, 1 

DISPLAY MESSAGE 1 

CURSOR 471 

DISPLAY MESSAGE 2 

MOV AH, 07 ;wait for key press 
INT 21H 

MOV AX, 05H get mouse press count 
MOV BX, 0 ;check press count for left button 
INT 33H 

MOV AX, BX 7BX=button press count 
MOV Bi, LO 

DIV BL 

OR AX, 3030H PC@Omweic alice ice) ASCII 
MOV P_ COUNT, AL ,;save the number 

MOV P COURT+1, AH 

CURSORS O72 


DISPLAY MESSAGE 3 
DISPEAY P COUNT, 
CURSORE 072 
DISPLAY MESSAGE 4 


MOV AH, 07 ;wait for a key press to get out 
INT 215 


MOV 2E, 02 ;hide mouse 

INT Sod 

MOV AH, 0 ;restore original video mode 
MOV AL,OLDVIDEO ;load original vide mode 

INT 10H 

MOV AH, 4CH 1030 back to DOS 

INT PAL Bl 

ENDP 

END MAIN 


Program 5-5 (continued from the previous page) 


Review Questions 


1. Which function of INT 33H is used to detect the presence of a mouse in a PC? In 
which register do we expect to get that information? In which register do we find the 
number of buttons in the mouse? 

2. The following code is an attempt to call INT 33H function 2. Is it correct? If not, cor- 
rect it. 

MOV AH,02 

INT 33H 
3. Why do we need to save the original video mode before changing it for the mouse? 
4. In INT 33H function AX = 03, how can a left button press be detected? 


ner reer eeee acer eee ————EEEEEEEE——_= 


CHAPTER 5: KEYBOARD AND MOUSE PROGRAMMING 173 


PROBLEMS 


SECTION 5.1: INT 16H KEYBOARD PROGRAMMING 


1. Which function of INT 16H is used to find out which key has been pressed? 
2. Inthe above question, how can the ASCII value for the pressed key be obtained? 


SECTION 5.2: MOUSE PROGRAMMING WITH INT 33H 


3. Using INT 33H, write and test an Assembly language program to check the presence 
of a mouse in a PC. If a mouse driver is installed, it should state the number of but- 
tons it supports. If no mouse driver is installed, it should state this. 


ANSWERS TO REVIEW QUESTIONS 


SECTION 5.1: INT 16H KEYBOARD PROGRAMMING 


1. INT 16H function AH = 01 

2. After return from INT 16H function AH = 01, if ZF = 1 there is no key press; if ZF = 
0 then a key has been pressed. 

3. If ZF = 0, then we use INT 16H function AH = 0 to get the ASCII character for the 
pressed key. 

4. INT 21H waits for the user to press the key; INT 16H scans the keyboard, allowing 
the program to continue executing other tasks while scanning for the key press. 


Dy 

AGAIN: MOV AH, 02 7USE FUNCTION AH=02 OF INT 21H 
MOV DL, 07 ;SOUND THE BELL BY SENDING OUT BELL CHAR 
DA Zel 
MOV AH, 01 ; CHECK FOR KEY PRESS 
INT 16H ;USING INT. 16H 
JZ AGAIN 7; IF NO KEY PRESS KEEP SOUNDING THE BELL 
MOV AH,0 ;TO GET THE CHARACTER 
INT 16H 7;WE MUST USE INT 16H ONE MORE TIME 
IMIS JAIL UCU BAUS) TEN ED 
JE EXIT ,LF YES EXIT 
JMP AGAIN 7;NO. KEEP SOUNDING THE BELL 

EZONE: MOV AH,4CH ;GO BACK TO DOS 
TNATH 


SECTION 5.2: MOUSE PROGRAMMING WITH INT 33H 


1. AX =Q; register AX; number of buttons in register BX 
2. Itis wrong. Register AL = 02 and AH = 0. 
MOV AX, 02 
INT 33H 
3. Because if we don't do that, when we go back to DOS we lose our cursor if the mouse 
program has changed the video to graphics. 
4. We check the contents of register BX for value 01. 


174 


CHAPTER 6 


SIGNED NUMBERS, STRINGS, 


AND TABLES 


OBJECTIVES 


Upon completion of this chapter, you will be able to: 


>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 


>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 


Represent 8- or 16-bit signed numbers as used in computers 
Convert a number to its 2’s complement 
Code signed arithmetic instructions ADD, SUB, IMUL, and IDIV 
Demonstrate how arithmetic instructions affect the sign flag 
Explain the difference between a carry and an overflow 
Prevent overflow errors by sign-extending data 
Code signed shift instructions SAL and SAR 
Code logic instruction CMP for signed numbers and explain its effect 
on the flag register 
Code conditional jump instructions after CMP of signed data 
Explain the function of registers SI and DI in string instructions 
Describe the operation of the direction flag in string instructions 
Code instructions CLD and STD to control the direction flag 
Describe the operation of the REP prefix 
Code string instructions: 

MOVSB and MOVSW for data transfer 

STOS, LODS to store and load the contents of AX 

CMPS to compare two strings of data 

SCAS to scan a string for data matching that in AX 

XLAT for table processing 


175 


This chapter deals with signed numbers and tables. In Section 6.1, we focus on 
the concept of signed numbers in software engineering. Signed number operations are 
explained along with examples. We discuss string operations and table processing in 
Section 6.2. 


SECTION 6.1: SIGNED NUMBER ARITHMETIC OPERATIONS 


All data items used so far have been unsigned numbers, meaning that the entire 
8-bit or 16-bit operand was used for the magnitude. Many applications require signed 
data. In this section the concept of signed numbers is discussed along with related instruc- 
tions. 


Concept of signed numbers in computers 


In everyday life, numbers are used that could be positive or negative. For exam- 
ple, a temperature of 5 degrees below zero can be represented as —5, and 20 degrees above 
zero as +20. Computers must be able to accommodate such numbers. To do that, comput- 
er scientists have devised the following arrangement for the representation of signed pos- 
itive and negative numbers: The most significant bit (MSB) is set aside for the sign (+ or 
—) and the rest of the bits are used for the magnitude. The sign is represented by O-for pos- 
itive (+) numbers and | for negative (-) numbers. Signed byte and word representations 
are discussed below. 


Signed byte operands 
In signed byte operands, D7 (MSB) is the sign and DO to D6 are set aside for the 


magnitude of the number. If D7 
= 0, the operand is positive, 
p Fj 


and if D7 = 1, it is negative. 
Positive numbers 


The range of positive 
numbers that can be represent- 
ed by the format above is 0 to 
FIZI 


0 0000 0000 
srl! 0000 0001 
T5 0000 0101 


Tle) Oils aii 
If a positive number is larger than +127, a word-sized operand must be used. 
Word operands are discussed later. 


Negative numbers 


For negative numbers D7 is 1, but the magnitude is represented in 2's comple- 
ment. Although the assembler does the conversion, it is still important to understand how 
the conversion works. To convert to negative number representation (2's complement), 
follow these steps: 


1. Write the magnitude of the number in 8-bit binary (no sign). 
2. Invert each bit. 
3. Add | to it. 


Examples 6-1, 6-2, and 6-3 demonstrate these three steps. From the examples 
above it is clear that the range of byte-sized negative numbers is -1 to -128. The follow- 
ing lists byte-sized signed number ranges: 


eee 
176 


Decimal Binary Hex 


=e 1000 0000 80 
=27 1000 0001 81 
SAG 1000 0010 82 
—2 LIE ah BE 
=l LLLI eA EE 

0 0000 0000 00 
Pl 0000 0001 on 
Pe 0000 0010 02 
H2 O 1 Silt 1 TE 


Example 6-1 


Show how the computer would represent —5. 


Solution: 


Iis COO OR Onno: Dem Spit Oimary 
2. WAT 1010 invert each bit 
Sige al i lal TOIT add 1 (hex = FBH) 


This is the signed number representation in 2's complement for —5. 
Example 6-2 


Show -34H as it is represented internally. 
Solution: 
0011 0100 


ToO Lae aL 
1100 1100 (which as CGH) 


Example 6-3 


Show the representation for -128;9. 
Solution: 
IE 1000 0000 


Ze yal ak IET 
3. ° 1000 0000 Notice that this is not negative zero (0). 


Word-sized signed numbers 


In x86 computers a word is 16 bits in length. Setting aside the MSB (D15) for the 
sign leaves a total of 15 bits (D14—D0) for the magnitude. This gives a range of -32768 
to +32767. If a number is larger than this, it must be treated as a multiword operand and 
be processed chunk by chunk the same way as unsigned numbers (as discussed in Chapter 
3). The following shows the range of signed word operands. To convert a negative num- 
ber to its word operand representation, the three steps discussed in negative byte operands 


are used. 
psf poop] [oo Po [oe [osm fo [> Por 
sign magnitude 


CHAPTER 6: SIGNED NUMBERS, STRINGS, AND TABLES 177 


Decimal Binary esa 


=e) Oe 1000 0000 OQ000 0000 8000 
=372, 167 1000 0000 0000 0001 8001 
=32 p WOE 1000 0000 0000 0010 8002 
=e LAT LLR eae) EEEE 
= E L LLL FEE 

0 0000 0000 0000 0000 0000 
ar dl 0000 0000 0000 0001 0001 
p2 0000 0000 0000 0010 0002 
+32 UES pe eit tp et L LLO TEEE 
Poe, IST OSAL 211s LLAN ALLI TEFF 


Overflow problem in signed number operations 


When using signed numbers, a serious problem arises that must be dealt with. 
This is the overflow problem. The CPU indicates the existence of the problem by raising 
the OF (overflow) flag, but it is up to the programmer to take care of it. The CPU under- 
stands only Os and 1s and ignores the human convention of positive and negative num- 
bers. Now what is an overflow? If the result of an operation on signed numbers is too large 
for the register, an overflow occurs and the programmer must be notified. Look at 
Example 6-4. 


Example 6-4 


Look at the following code and data segments: 


DATA1 DB +96 
DATA2 DB aie 


MOV AL, DATAL 7;AL=0110 0000 (AL=60H) 
MOV BL, DATA2 7BL=0100 0110 (BL=46H) 
ADD AL, BL ;AL=1010 0110 (AL=A6H= 90 invalid!) 


+ %6 0110 0000 
ap HD) OOO WML 
+166 1010 0110 According to the CPU, this is 90, which is wrong. (OF = 1, SF = 1, CF = 0) 


In the example above, +96 is added to +70 and the result according to the CPU is 
-90. Why? The reason is that the result was more than what AL could handle. Like all 
other 8-bit registers, AL could only contain up to +127. The designers of the CPU creat- 
ed the overflow flag specifically for the purpose of informing the programmer that the 
result of the signed number operation is erroneous. 


When the overflow flag is set in 8-bit operations 


In 8-bit signed number operations, OF is set to 1 if either of the following two 
conditions occurs: 


1. There is a carry from D6 to D7 but no carry out of D7 (CF = 0). 
2. There is a carry from D7 out (CF = 1) but no carry from D6 to D7. 


In other words, the overflow flag is set to 1 if there is a carry from D6 to D7 or 
from D7 out, but not both. This means that if there is a carry both from D6 to D7 and from 
D7 out, OF = 0. In Example 6-4, since there is only a carry from D6 to D7 and no carry 
from D7 out, OF = 1. Examples 6-5, 6-6, and 6-7 give further illustrations of the overflow 
flag in signed arithmetic. 


eee 
178 


Example 6-5 


Observe the results of the following: 


MOV IDL IS) ;DL=1000 0000 (DL=80H) 
MOV Cela 2 7CH=1111 1110 (CH=FEH) 
ADD IDE (le! 7DL=0111 1110 (DL=7EH=+126 invalid!) 


= 128 1000 0000 
-2 DUAL yO 
= F30 0111 1110 OF=1, SF=0 (positive), 


According to the CPU, the result is +126, which is wrong. The error is indicated by the fact that 
OF = 1, 


Example 6-6 
Observe the results of the following: 


MOV AL,- 2 7;AL=1111 1110 (AL=FEH) 
MOV CL,-—5 7;CL=1111 1011 (CL=FBH) 
ADD CL, AL 7CL=1111 1001 (CL=F9H=7 which is correct) 


= 2 Waa ALL 
=e ge a Loyal 
— ( E 1001 OF = 0, CF = 0, and SF = 1 (negative); the result is correct since OF = 0. 


Example 6- 


Observe the results of the following: 


MOV DAY ae i ;DH=0000 0111 (DH=07H) 
MOV BH, +18 ;BH=0001 0010 (BH=12H) 
ADD BH, DH ;BH=0001 1001 (BH=19H=+25, correct) 


+7 CUO OM Oma 
+ +18 0001 0010 
+25 0001 1001 OF = 0, CF = 0, and SF = 0 (positive). 


Overflow flag in 16-bit operations 


In a 16-bit operation, OF is set to 1 in either of two cases: 


1. There is a carry from D14 to D15 but no carry out of D15 (CF = 0). 
2. There is a carry from D15 out (CF = 1) but no carry from D14 to D15. 


Again the overflow flag is low (not set) if there is a carry from both D14 to D15 
and from D15 out. The OF is set to 1 only when there is a carry from D14 to D15 or from 
D15 out, but not from both. See Examples 6-8 and 6-9. 


Avoiding erroneous results in signed number operations 


To avoid the problems associated with signed number operations, one can sign- 
extend the operand. Sign extension copies the sign bit (D7) of the lower byte of a register 
into the upper bits of the register, or copies the sign bit of a 16-bit register into another 
register. CBW (convert signed byte to signed word) and CWD (convert signed word to 
signed double word) are used to perform sign extension. They work as follows: 


deena a asa aaaacaaaaacaaaasasasaaaaasa, 


CHAPTER 6: SIGNED NUMBERS, STRINGS, AND TABLES 179 


Example 6-8 
Observe the results in the following: 


MOV AX, 6E2FH 2 28207 
MOV CX, 13D4H for a, WS 
ADD AX, CX ;= 33,283 is the expected answer 


OERO LLO QOTO LIL 

0001 0011 1101 0100 

1000 0010 0000 0011 = — 32,253 incorrect! 
OF = 1, CF =0; SF = 1 


Example 6-9 


Observe the results in the following: 


MOV DX, 542FH 2 Ziad: 
MOV BX, 12E0H ; #4, 832 
ADD DX, BX 7=26, 383 


543F 0101 0100 0010 TITI 
tal 20 COOTER OR COO 
670F 0110 0111 0000 1111 = 26,383 (correct answer); OF = 0, CF = 0, SF = 0 


CBW will copy D7 (the sign flag) to 


7 
all bits of AH. This is demonstrated below. -= 
Notice that the operand is assumed to be AL 


and the previous contents of AH are destroyed. AH AL 
MOV AL, +96 7;AL=0110 0000 
CBW ;now AH=0000 0000 and AL=0110 0000 
or: 
MOV AL, -2 7;AL=1111 1110 
CBW Fo =A ool EO tl C= Xo I= S LLO 


CWD sign-extends AX. It copies D15 of AX to all bits of the DX register. This is 
used for signed word operands. This is illustrated below. 


1) 0 I5 0 


DX AX 
Look at the following example: 


MOV AX,+260 7;AX=0000 0001 0000 0100 or Ax=01045 
CWD ; DX=0000H and AX=0104H 


Another example: 


MOV Ax, 32766 ,AX=1000 0000 0000 0010B or AX=8002H 
CWD ;DX=FFFF and AX=8002 


180 


As can be seen in the examples above, CWD does not alter AX. The sign of AX 
is copied to the DX register. How can these instructions help correct the overflow error? 
To answer that question, Example 6-10 shows Example 6-4 rewritten to correct the over- 
flow problem. 

In Example 6-10, if the overflow flag is not raised (OF = 0), the result of the 
signed number is correct and JNO (jump if no overflow) will jump to OVER. However, 
if OF = 1, which means that the result is erroneous, each operand must be sign-extended 
and then added. That is the function of the code below the JNO instruction. The program 
in Example 6-10 works for addition of any two signed bytes. 


Example 6-10 


Rewrite Example 6-4 to provide for handling the overflow problem. 


Solution: 


DATAI1 
DATA2 
RESULT 


AH, AH ; AH=0 

AL, DATAI 7GET OPERAND I 

BL, DATA2 7;GET OPERAND 2 

AL, BL 7ADD THEM 

OVER 7; IF OF=0 THEN GO TO OVER 

AL, DATA2 ;OTHERWISE GET OPERAND 2 TO 
;SIGN EXTEND IT 

BX, AX 7; CAV EET BEN TEX 

AL, DATA1 ;GET BACK OPERAND 1 TO 
PSUGN EXTEND TE 

AX, BX 7;ADD THEM AND 

RESULT, AX ;SAVE IT 


The following is an analysis of the values in Example 6-10. Each is sign-extend- 
ed and then added as follows: 


‘Si AH AL 

0 000 0000 0110 0000 +96 after sign extension 
O 000 0000 CLOG Ole +70 after sign extension 
0 000 0000 LOO aeRO +166 


As a tule, if the possibility of overflow exists, all byte-sized signed numbers 
should be sign-extended into a word, and similarly, all word-sized signed operands should 
be sign-extended before they are processed. This will be shown shortly in Program 6-1. 
Before discussing that, it is important to understand the division and multiplication of 
signed operands. 


IDIV (Signed number division) 


The Intel manual says that IDIV means "integer division"; it is used for signed 
number division. In actuality, all arithmetic instructions of the 8088/86 are for integer 
numbers regardless of whether the operands are signed or unsigned. To perform opera- 
tions on real numbers, the 8087 coprocessor is used. Remember that real numbers are the 
ones with decimal points such as "3.56". Division of signed numbers is very similar to 
the division of unsigned numbers discussed in Chapter 3. Table 6-1 summarizes signed 
number division. 


ee eee eee ————————————————a 
CHAPTER 6: SIGNED NUMBERS, STRINGS, AND TABLES 181 


Division Numerator Denominator Quotient Rem. 


byte/byte AL = byte CBW register or memory AL AH 
word/word AX = word CWD register or memory AX DX 
word/byte AX = word register or memory AL! AH 
doubleword/word DXAX = doubleword register or memory AX2 DX 
Notes: 


1. Divide error interrupt if -127 > AL > +127. 
2. Divide error interrupt if -32,767 > AL > +32,767. 


Table 6-2: Signed Multiplication Summary 


Multiplication _Operand 1 Operand 2 Result 

byte x byte AL register or memory AX! 

word x word AX register or memory DX AX? i 
word x byte AL = byte CBW register or memory DX AX? 


Notes: 

1. CF = I and OF = 1 if AH has part of the result, but if the result is not large enough to need the AH, 
the sign bit is copied to the unused bits and the CPU makes CF = 0 and OF = 0 to indicate that. 

2, CF = 1 and OF = 1 if DX has part of the result, but if the result is not large enough to need the 
DX, the sign bit is copied to the unused bits and the CPU makes CF = 0 and OF = 0 to indicate that. 
One can use the J condition to find out which of the conditions above has occurred. The rest of the 
flags are undefined 


FIND THE AVERAGE TEMPERATURE 
ede ee 
-MODEL STMALL 
~-STACK 64 


SIGN DAT Bes, 0, tee a ee Oe bee OG 
ORG 0010H 

AVERAGE DW ? 

REMAINDER 


MAIN PROC 
MOV AX, @DATA 
MOV DS, AX 
MOV Ox, 7LOAD COUNTER 
SUB BX, BX ;CLEAR BX, USED AS ACCUMULATOR 
MOV SI,OFFSET SIGN DAT PSknAE (Ule)= IACVILINMIC ERIE 


MOV ARPES] ;MOVE BYTE INTO AL 

CBW ¿SIGN EXTEND INTO AX 
ADD BX, AX ADDTTO BX 

INC SI ; INCREMENT POINTER 

LOOP BACK , LOOP IF NOT FINISHED 
MOV AL,9 ;MOVE COUNT TO AL 

CBW 7SIGN EXTEND INTO AX 
MOV CX, AX 7; SAVE DENOMINATOR IN CX 
MOV AX, BX 7;MOVE SUM TO AX 

CWD 7;SIGN EXTEND THE SUM 
DDTV (3x FIND THE AVERAGE 

MOV AVERAGE, AX ; STORE THE AVERAGE (QUOTIENT) 
MOV REMAINDER, DX ; STORE THE REMAINDER 
MOV AH, 4CH 

INT 2 ALIBI GO BACK TO DOS 

ENDP 

END MAIN 


Program 6-1 


—— eee 
182 


IMUL (Signed number multiplication) 


Signed number multiplication is similar in its operation to the unsigned multipli- 
cation described in Chapter 3. The only difference between them is that the operands in 
signed number operations can be positive or negative; therefore, the result must indicate 
the sign. Table 6-2 summarizes signed number multiplication; it is similar to Table 3-1. 

An application of signed number arithmetic is given in Program 6-1. It computes 
the average of the following Celsius temperatures: +13, -10, +19, +14, -18, -9, +12, 
-19, and +16. 

The program is written in such a way as to handle any overflow that may occur. 
In Program 6-1, each byte of data was sign-extended and added to BX, computing the total 
sum, which is a signed word. Then the sum and the count were sign-extended, and by 
dividing the total sum by the count (number of bytes, which in this case is 9), the average 
was calculated. 

The following is the "Ist" file of Program 6-1. Notice the signed number format 
provided by the assembler. 


Microsoft (R) Macro Assembler Version 5.10 


TITLE PROG6-1 FIND THE AVERAGE TEMPERATURE 
PAGE 60,132 

0000 -MODEL SMALL 

0000 .STACK 64 


0000 

0000 OD Fé 13 OE EE F7 SIGN DAT Dee 3,—-10,4+19,414,-1e,-9, +12, - Los 
QC EF? 10 

0010 ORG 0010H 

0010 0000 AVERAGE DW ? 

0012 0000 REMAINDER 


0014 

0000 

0000 AX, @DATA 

0003 DS, AX 

0005 CX,9 ;LOAD COUNTER 

0008 BX, BX ;CLEAR BX, USED AS ACC 
OOOA sl, OFFSET @iGNeDAT ;SET UP POINTER 
000D 04 AL,[ SI] ;MOVE BYTE INTO AL 
OOOF 7;SIGN EXTEND INTO AX 

0010 D8 BX, AX 7; ADD TO Bx 

0012 Sl ; INCREMENT POINTER 

OLS F8 BACK ; LOOP I INOME JONES sie)b) 

0015 09 AL,9 ;MOVE COUNT TO AL 

0017 ;SIGN EXTEND INTO AX 

0018 C8 CX, AX 7;SAVE DENOMINATOR IN CX 
001A es) AX, BX *;MOVE SUM TO AX 

OO1LC 7SIGN EXTEND THE SUM 

001D ES CX ;FIND THE AVERAGE 

QOIDA 0010 R AVERAGE, AX ;STORE THE AVERAGE (QU 
0022 UG OO Is REMAINDER, DX ;STORE THE REMAINDER 
0026 4c AH, 4CH 

0028 Zak 21H /GCOTBACK TO DOS 


002A 
MAIN 


List File for Program 6-1 
POE 
CHAPTER 6: SIGNED NUMBERS, STRINGS, AND TABLES 183 


Arithmetic shift 


As discussed in Chapter 3, there are two types of shifts: logical and arithmetic. 
Logical shift, which is used for unsigned numbers, was discussed previously. The arith- 
metic shift is used for signed numbers. It is basically the same as the logical shift, except 
that the sign bit is copied to the shifted bits. SAR (shift arithmetic right) and SAL (shift 
arithmetic left) are two instructions for the arithmetic shift. 


SAR (shift arithmetic right) 


SAR destination, count MSB |-—> MSB —————>LSB — jc | 


As the bits of the destination are shifted to the right into CF, the empty bits are 
filled with the sign bit. One can use the SAR instruction to divide a signed number by 2, 
as shown next: 


MOV AL, =10 :AL=-10=86f1111 Oil lo 
SAR AL, 1 ;AL is arithmetic shifted right once 
7;AL=1111 1011=FDH=—-5 


Example 6-11 demonstrates the use of the SAR instruction. 
Example 6-11 
Using DEBUG, evaluate the results of the following: 


MOV AX, -9 

MOV BL; 2 

LDIV BL pdivide -9 by 2 results Gm-FCH 

MOV AX, -9 

SAR AX,1 foivyide -9 by 2 with arithmetre eshte 
;results in FBH 


Solution: 


The DEBUG trace demonstrates that an IDIV of -9 by 2 gives FCH (— 4), whereas SAR -9 
gives FBH (—5). This is because SAR rounds negative numbers down but IDTV rounds up. 


SAL (shift arithmetic left) and SHL (shift left) 


These two instructions do exactly the same thing. It is basically the same instruc- 
tion with two mnemonics. As far as signed numbers are concerned, there is no need for 
SAL. Fer a discussion of SHL (SAL), see Chapter 3. 


Signed number comparison 


CMP dest, source 


Although the CMP (compare) instruction is the same for both signed and 
unsigned numbers, the J condition instruction used to make a decision for the signed num- 
bers is different from that used for the unsigned numbers. While in unsigned number com- 
parisons CF and ZF are checked for conditions of larger, equal, and smaller (see Chapter 
3), in signed number comparison, OF, ZF, and SF are checked: 


destination > source OF=SF or ZF=0 
destination = source Vio Ih 
destination.< semrce OF=negation of SF 


eee 
184 


The mnemonics used to detect the conditions above are as follows: 


JG Jump Greater jump if OF=SF or ZF=0 

JGE Jump Greater or Equal jump if OF=SF 

JL Jump Less jump if OF=inverse of SF 

JLE Jump Less or Equal jump if OF=inverse of SF or ZF=1 
JE Jump if Equal Jump of Am =] 


l Example 6-12 should help clarify how the condition flags are affected by the com- 
pare instruction. Program 6-2 is an example of the application of the signed number com- 
parison. It uses the data in Program 6-1 and finds the lowest temperature. 


Example 6-12 


Show the DEBUG trace of the following instructions comparing several signed numbers. 


MOV AL,-5 
CMP AL, 9 
CMP AL,-2 
CMP AL,-5 
CMP  AL,+7 


Solution: 


C>debug 
zE OO 
103D:0100 
LO) SDS (0 2.0) 
103D:0104 
LOSD 30106 
103D:0108 
103D:010A 
103D:010B 
-t=100,5 


AX=00FB BX=0000 CX=0000 DX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=0102 NV UP DI PL NZ NA PO NC 
TOSDE OLOA 3CE7 CMP ALTE 


AX=00FB BX=0000 Cx=0000 DX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=0104 NV UP DI PL NZ NA PO NC 
WOBO SCRE CMP TAL, EE 


AX=00FB BX=0000 CX=0000 DX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=0106 NV UP DI NG NZ AC PO CY 
NODD: 01067 SCRE MCME= Ali as 


AX=00FB BX=0000 CX=0000 DX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D Es—103D SS=103D CS-103D BB=0108 NY UP» DI PL ZR NA PE NC 
IOSD 00S 3CO7 CMP AL, 07 


AX=00FB BX=0000 CX=0000 DX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=010A NV UP DI NG NZ NA PO NC 


l0eratatx CC INT 3 


=q 


e, 
CHAPTER 6: SIGNED NUMBERS, STRINGS, AND TABLES 185 


;FIND THE LOWEST TEMPERATURE 


-MODEL SMALL 
-STACK 64 


SIGN DAT +13,-10,+19,+14,-18,-9,+12,-19,+16 
ORG OKOLO 
LOWEST 


MAIN PROC 
MOV AX, @DATA 


MOV DS, AX 
MOV CX,8 ;LOAD COUNTER (NUMBER ITEMS - 1) 
MOV SI,OFFSET SIGN DAT ;SET UP POINTER ` 
MOV AL,[ ST] =~ ;AL HOLDS LOWEST VALUE FOUND SO FAR 
ING ST ; INCREMENT POINTER 
CMP Arpi SI] ;COMPARE NEXT BYTE TO LOWEST 
JLE SEARCH ;IF AL IS LOWEST, CONTINUE SEARCH 
MOV AL,[ SI] ;OTHERWISE SAVE NEW LOWEST 

SEARCH: LOOP BACK ;LOOP IF NOT FINISHED 
MOV LOWEST, AL ;SAVE LOWEST TEMPERATURE 
MOV AH, 4CH 
INT® 218 ;GO BACK TO DOS 

MAIN ENDP 
END MAIN 


Program 6-2 
Review Questions 


1. In an 8-bit operand, bit 
is used for the sign bit. 
2. Convert 16H to its 2's complement representation. 


is used for the sign bit, whereas in a 16-bit operand, bit 


3. The range of byte-sized signed operands is — to + . The range of word- 
sized signed operands is — to + 

4. Explain the difference between an overflow and a carry . 

5. Explain the purpose of the CBW and CWD instructions. Demonstrate the effect of 
CBW on AL = F6H. Demonstrate the effect of CWD on AX = 124CH. 

6. The instruction for signed multiplication is . The instruction for signed divi- 
sion is 

7. Explain the difference between the SHR (discussed in Chapter 3) and SAR instruc- 
tions. 


8. For each of the following instructions, indicate the flag condition necessary for each 
jump to occur: 
(a) JLE (b) JG 


SECTION 6.2: STRING AND TABLE OPERATIONS 


There is a group of instructions referred to as string instructions in the x86 fami- 
ly of microprocessors. They are capable of performing operations on a series of operands 
located in consecutive memory locations. For example, while the CMP instruction can 
compare only 2 bytes (or words) of data, the CMPS (compare string) instruction is capa- 
ble of comparing two arrays of data located in memory locations pointed at by the SI and 
DI registers. These instructions are very powerful and can be used in many applications, 
as will be shown shortly. 


eee 
186 


Use of SI and DI, DS and ES in string instructions 


For string operations to work, designers of CPUs must set aside certain registers 
for specific functions. These registers must permanently provide the source and destina- 
tion operands. This is exactly what the designers of the x86 have done. In 8088/86 micro- 
processors, the SI and DI registers always point to the source and destination operands, 
respectively. Now the question is: Which segments are they combined with to generate the 
20-bit physical address? To generate the physical address, the 8088/86 always uses SI as 
the offset of the DS (data segment) register and DI as the offset of ES (extra segment). 
This is the default mode. It must be noted that the ES register must be initialized for the 
string operation to work. 


Byte and word operands in string instructions 


In each of the string instructions, the operand can be a byte or a word. Operands 
are distinguished by the letters B (byte) and W (word) in the instruction mnemonic. 
Table 6-3 provides a summary of all the string instructions. Each one will be discussed 
separately in the context of examples. 


Table 6-3: String 


peration Summar 


Instruction Mnemonic Destination Source Prefix 


move string byte MOVSB ES:DI DS:SI REP 

move string word MOVSW ES:DI DS:SI REP 

store string byte STOSB ES:DI AL 

store string word STOSW ES:DI AX REP 

load string byte LODSB AL DS:SI none 

load string word LODSW AX DS:SI none 

compare string byte CMPSB ES:DI DS:SI REPE/REPNE 
compare string word _ CMPSW ES:DI DS:SI REPE/REPNE 
scan string byte SCASB ES:DI AL REPE/REPNE 
scan string byte SCASW | ES:DI AX REPE/REPNE 


DF, the direction flag 


To process operands located in consecutive memory locations requires that the 
pointer be incremented or decremented. In string operations this is achieved by the direc- 
tion flag. Of the 16 bits of the flag register (DO—D15), bit 11 (D10) is set aside for the 
direction flag (DF). It is the job of the string instruction to increment or decrement the SI 
and DI pointers, but it is the job of the programmer to specify the choice of increment or 
decrement by setting the direction flag to high or low. The instructions CLD (clear direc- 
tion flag) and STD (set direction flag) are specifically designed for that. 

CLD (clear direction flag) will reset (put to zero) DF, indicating that the string 
instruction should increment the pointers automatically. This automatic incrementation 
sometimes is referred to as autoincrement. 

STD (set the direction flag) performs the opposite function of the CLD instruc- 
tion. It sets DF to 1, indicating to the string instruction that the pointers SI and DI should 
be decremented automatically. 


REP prefix 


The REP (repeat) prefix allows a string instruction to perform the operation 
repeatedly. Now the question is: How many times is it repeated? REP assumes that CX 
holds the number of times that the instruction should be repeated. In other words, the REP 
prefix tells the CPU to perform the string operation and then decrements the CX register 
automatically. This process is repeated until CX becomes zero. To understand some of the 
concepts discussed so far, look at Example 6-13. 


ee eee eee eee —————————————_EEE=7= 


CHAPTER 6: SIGNED NUMBERS, STRINGS, AND TABLES 187 


Example 6-13 


Using string instructions, write a program that transfers a block of 20 bytes of data. 


Solution: 


jin the data segment: 

DATA1 DB ‘ABCDEFGHIJKLMNOPORST' 
ORG 30H 

DATA2 DB 20 DURT (Z) 


;in the code segment: 
MOV AX, @DATA 
MOV DS, AX ,; INITIALIZE THE DATA SEGMENT 
MOV ES, AX INITIALIZE THE EXTRA SEGMENT 
CLD ;CLEAR DIRECTION FLAG FOR AUTOINCREMENT 
MOV SI, OFFSET DATA1 LOAD THE SOURCE POINTER y 
MOV DI, OFFSET DATA2 ; LOAD THE DESTINATION POINTER 
MOV CXT 20 LOAD THE COUNTER 
REP MOVSB ; REPEAT UNTIL CX BECOMES ZERO 


In Example 6-13, after the transfer of every byte by the MOVSB instruction, both 
the SI and DI registers are incremented automatically once only (notice CLD). The REP 
prefix causes the CX counter to be decremented and MOVSB is repeated until CX 
becomes zero. Notice in Example 6-13 that both DS and ES are set to the same value. 

An alternative solution for Example 6-13 would change only two lines of code: 


MOV CX, 10 
REP MOVSW 


In this case the MOVSW will transfer a word (2 bytes) at a time and increment 
the SI and DI registers each twice. REP will repeat that process until CX becomes zero. 
Notice that the CX has the value of 10 in it since 10 words is equal to 20 bytes. 


STOS and LODS instructions 


The STOSB instruction stores the byte in the AL register into memory locations 
pointed at by ES:DI and increments (if DF = 0) DI once. If DF = 1, then DI is decrement- 
ed. The STOSW instruction stores the contents of AX in memory locations ES:DI and 
ES:DI+1 (AL into ES:DI and AH into ES:DI+1), then increments DI twice (if DF = 0). If 
DF = 1, DI is decremented twice. 

The LODSB instruction loads the contents of memory locations pointed at by 
DS:SI into AL and increments (or decrements) SI once if DF = 0 (or DF = 1). LODSW 
loads the contents of memory locations pointed at by DS:SI into AL and DS:SI+1 into AH. 
The SI is incremented twice if DF = 0. Otherwise, it is decremented twice. LODS is never 
used with a REP prefix. 


Testing memory using STOSB and LODSB 


Example 6-14 uses string instructions STOSB and LODSB to test an area of RAM 
memory. In the program in Example 6-14, first AAH is written into 100 locations by using 
word-sized operand AAAAH and a count of 50. In the test part, LODSB brings in the 
contents of memory locations into AL one by one, and each time it is eXclusive-ORed 
with AAH (the AH register has the hex value of AA). If they are the same, ZF = 1 and 
the process is continued. Otherwise, the pattern written there by the previous routine is not 
there and the program will exit. This, in concept, is somewhat similar to the routine used 
in the IBM PC's BIOS except that the BIOS routine is much more involved and uses sev- 
eral different patterns of data for the test and it can be used to test any part of RAM, either 
the main RAM or the video RAM. 


$$$ $$ eee 
188 


Example 6-14 


Write a program that: 

(1) Uses STOSB to store byte AAH in 100 memory locations. 

(2) Uses LODS to test the contents of each location to see if AAH is there. If the test fails, the 
system should display the message "bad memory". 


Solution: 


Assuming that ES and DS have been assigned in the ASSUME directive, the following is from 
the code segment: 


EUT PATTERN AAAAH IN TO 50 WORD LOCATIONS 

MOV AX, DTSEG 7 INITIALIZE 

MOV DS, AX nS REG 

MOV ES, AX AND ES REG 

CED ; CLEAR DF FOR INCREMENT 

MOV CX TSO 7;LOAD THE COUNTER (50 WORDS) 

MOV DI,OFFSET MEM AREA ;LOAD THE POINTER FOR DESTINATION 

MOV AX, OAAAAH ; LOAD THE PATTERN 

REP STOSW ;REPEAT UNTIL CX=0 

;BRING IN THE PATTERN AND TEST IT ONE BY ONE 

MOV SI,OFFSET MEM AREA ;LOAD THE POINTER FOR SOURCE 

MOV CX, 100 ; LOAD THE COUNT (COUNT 100 BYTES) 
AGAIN: LODSB 7LOAD INTO AL FROM DS:SI 

XOR AL, AH 7IS PATTERN THE SAME? 

JNZ OVER ;IF NOT THE SAME THEN EXIT 

LOOP AGAIN ; CONTINUE UNTIL CX=0 

JMP EXIT ;EXIT PROGRAM 

MOV AH, 09 7{ DISPLAY 

MOV DX, OFFSET MESSAGE ;{ THE MESSAGE 

INT 2H 7{ ROUTINE 


The REPZ and REPNZ prefixes 


These prefixes can be used with the CMPS and SCAS instructions for testing pur- 
poses. They are explained below. 


REPZ (repeat zero), which is the same as REPE (repeat equal), will repeat the 
string operation as long as the source and destination operands are equal (ZF = 1) or until 
CX becomes zero. 

REPNZ (repeat not zero), which is the same as REPNE (repeat not equal), will 
repeat the string operation as long as the source and destination operands are not equal (ZF 
= 0) or until CX becomes zero. These two prefixes will be used in the context of applica- 
tions after the explanation of the CMPS and SCANS instructions. 

CMPS (compare string) allows the comparison of two arrays of data pointed at 
by the SI and DI registers. One can test for the equality or inequality of data by use of the 
REPE or REPNE prefixes, respectively. The comparison can be performed a byte at a 
time or a word at time by using CMPSB or CMPSW. 


eee ee reer errr reer ——— 


CHAPTER 6: SIGNED NUMBERS, STRINGS, AND TABLES 189 


For example, if comparing "Euorop" and "Europe" for equality, the comparison 
will continue using the REPE CMPS as long as the two arrays are the same. 


;from the data segment: 


DATAI1 DB'Europe' 

DATA2 DB'Buorop' 
;from the code segment: 
CLD 7DF=0 for 
MOV Si,OFFSET DATA ; SI=DATAI1 
MOV DI,OFFSET DATA2 ; DI=DATA2 
MOV CX, 06 load the 
REPE CMPSB 


Example 6-15 


increment 
offset 
offset 
counter 


repeat until not equal or Cx=0 


Assuming that there is a spelling of "Europe" in an electronic dictionary and a user types in 

"Euorope", write a program that compares these two and displays the following message, 
depending on the result: 
1. If they are equal, display "The spelling is correct". 
2. If they are not equal, display "Wrong spelling". 


Solution: 


"Europe! 
"Euorope' 


DAT DICT 
DAT TYPED 
MESSAGE1 
MESSAGE2 DB NWiteng =spellirg’, ISi 
;from the code segment: 

CLD 
MOV 
MOV 
MOV 
REPE 
JE 
MOV 
JMP 
MOV 
MOV 
INT 


SI,OFFSET DAT DICT 

DIEVFORFSET DAT TYPED 
CX, 06 
CMPSB 
OVER 

DX,OFFSET MESSAGE2 
DISPLAY 

DX,OFFSET MESSAGE1 
AH, 09 

Z dal 


OVER: 
DISPLAY: 


; REPEAT AS 
¿rE 


He SPELLE 1s correct’, 'S* 


;DF=0 FOR INCREMENT 
;SI=DATA1 OFFSET 

r DIS DATAZ O FRSE 

;LOAD THE COUNTER 

LONG AS EQUAL OR UNTIL CX=0 
ZF=1 THEN DISPLAY MESSAGE1 
IF ZF=0 THEN DISPLAY MESSAGE2 


In the case above, the two arrays are to be compared letter by letter. The first char- 
acters pointed at by SI and DI are compared. In this case they are the same ("E"), so the 
zero flag is set to 1 and both SI and Di are incremented. Since ZF = 1, the REPE prefix 
repeats the comparison. This process is repeated until the third letter is reached. The third 
letters "o" and "r" are not the same; therefore, ZF is reset to zero and the comparison will 
stop. ZF can be used to make the decision as shown in Example 6-15. 

One could juggle the code in Example 6-15 to make it more efficient and use 
fewer jumps, but for the sake of clarity it is presented in this manner. 

CMPS can be used to test inequality of two arrays using "REPNE CMPSB". For 
example, when comparing the following social security numbers, the comparison will 
continue to the last digit since no two digits in the same position are the same. 


22124-7659 


DOM= 7H 13 Sie 


—_— ese 


190 


SCASB (scan string) 


The SCASB string instruction compares each byte of the array pointed at by 
ES:DI with the contents of the AL register, and depending on which prefix, REPE or 
REPNE, is used, a decision is made for equality or inequality. For example, in the array 
"Mr. Gones", one can scan for the letter "G" by loading the AL register with the character 
"G" and then using the "REPNE SCASB" operation to look for that letter. 


jin the data segment: 


7; DATAL DB 'Mr. Gones' 
and in the code segment: 
CLD ;DF=0 FOR INCREMENT 
MOV DI,OFFSET DATA1 ; DISARRAY OFFSET 
MOV CXT 09 ; LENGTH OF ARRAY 
MOV AL, 'G' ;SCANNING FOR THE LETTER 'G' 
REPNE SCASB ; REPEAT THE SCANNING IF NOT EQUAL 


;O0R UNTIL THE CX IS ZERO 


Example 6-16 


Write a program that scans the name "Mr. Gones" and replaces the "G" with the letter "J", 
then displays the corrected name. 


Solution: 


;in the data segment: 
DATA1 DB IMr. Gones','S! 


;and in the code segment: 
MOV AX, @DATA 


MOV DS, AX 
MOV ES, AX 


CLD ;DF=0 FOR INCREMENT 

MOV DL,OFPSET DATA] 7ES:DI=ARRAY OFFSET 

MOV enue) ; LENGTH OF ARRAY 

MOV AL, 'G' ;SCANNING FOR THE LETTER 'G' 
REPNE SCASB ;REPEAT THE SCANNING IF NOT EQUAL 


JNE OVER UNTIL CX BS ZERO. JUME TERZO 
DEC DI ;DECREMBEND TO POINT AT  "G" 
MOV BESEER | DI]),'d' PREPHACE G" WHR "J' 

OVER: MOV AH, 09 BID ML Sye lysine 


MOV DX,OFFSET DATA1 FP LAE 
INT 21H ; CORRECTED NAME 


In Example 6-16, the letter "G" is compared with "M". Since they are not equal, 
DI is incremented and CX is decremented, and the scanning is repeated until the letter "G" 
is found or the CX register is zero. In this example, since "G" is found, ZF is set to 1 (ZF 
= 1), indicating that there is a letter "G" in the array. 


Replacing the scanned character 


SCASB can be used to search for a character in an array, and if it is found, it will 
be replaced with the desired character. See Example 6-16. 

In string operations the pointer is incremented after each execution (that is, if DF 
= 0). Therefore, in the example above, DI must be decremented, causing the pointer to 
point to the scanned character and then replace it. 


M 
CHAPTER 6: SIGNED NUMBERS, STRINGS, AND TABLES 191 


XLAT instruction and look-up tables 


There is often a need in computer applications for a table that holds some impor- 
tant information. To access the elements of the table, 8088/86 microprocessors provide the 
XLAT (translate) instruction. To understand the XLAT instruction, one must first under- 
stand tables. The table is commonly referred to as a look-up table. Assume that one needs 
a table for the values of x2, where x is between 0 and 9. First the table is generated and 
stored in memory: 


SQUR_TABLE DB 0,1,4,9,16, 25,36, 49, Gare 

Now one can access the square of any number from 0 to 9 by the use of XLAT. 
To do that, the register BX must have the offset address of the look-up table, and the num- 
ber whose square is sought must be in the AL register. Then after the execution of XLAT, 
the AL register will have the square of the number. The following shows how to get the 
square of 5 from the table: 


MOV BX,OFFSET SOUR TABLE ;load the offset address of table’ 
MOV AL,05 7;AL=05 will retrieve 6th element 
XLAT ;pull the element out of table 

jand put in Al 


After execution of this program, the AL register will have 25 (19H), the square of 
5. It must be noted that for XLAT to work, the entries of the look-up table must be in 
sequential order and must have a one-to-one relation with the element itself. This is 
because of the way XLAT works. In actuality, XLAT is one instruction, which is equiva- 
lent to the following code: 


SUB AH, AH ; AH=0 
MOV Si,AX ; SI=000X 
MOV AL,| BX+S1] GET THE SIth ENTRY FROM BEGINNING 


;OF THE TABLE POINTED AT BY BX 


In other words, if there was no XLAT instruction, the code above would do the 
same thing, and this is the way many RISC processors perform this operation. Now why 
would one want to use XLAT to get the square of a number from a look-up table when 
there is the MUL instruction? The answer is that MUL takes longer. 


Code conversion using XLAT 
In many microprocessor-based systems, the keyboard is not an ASCII type of key- 
board. One can use XLAT to translate the hex keys of such keyboards to ASCII. Assuming 


that the keys are 0—F, the following is the program to convert the hex digits of O-F to their 
ASCII equivalents. 


;data segment: 


ASC_TABL DB O Mabel 2 a A SP O TS 
DB T A B C T De ee 

HEX VALU DB R 

ASC _ VALU DB 2 

,code segment: 
MOV BX,OFFSET ASC TABL ;BX= TABLE OFFSET 
MOV AL, HEX VALU ;AL=THE HEX DATA 
XLAT 7GET THE ASCII EQUIVALENT 
MOV ASC_VALU,AL ¡MOVE IT TO MEMORY 


Review Questions 


1. In string operations, register is used to point to the source operand and register 
is used to point to the destination operand. 
2. SI is used as an offset into the segment, and DI is used as an offset into the 


—_ ees 


192 


segment. 
3 hg flag, bit of the flag register, is used to tell the CPU whether to incre- 
ment or decrement pointers in repeated string operations. 
4. State the purpose of instructions CLD and STD. 
5. Ifa string instruction is repeatedly executed because of a REP prefix, how does the 
CPU know when to stop repeating it? 
6. In the following program segment, what condition will cause the REPNZ to fail? 
MOV SI, OFFSET DATA1 
MOV ODI, OFFSET DATA2 
MOV CX, LENGTH 
REPNZ CMPSB 


PROBLEMS 


SECTION 6.1: SIGNED NUMBER ARITHMETIC OPERATIONS 


1. Show how the x86 computer would represent the following numbers and verify each 


with DEBUG. 
(aye23. OE (c) -28H (d) +6FH 
(e)-128 (f) +127 (g) +365 (h) -32,767 


2. Find the overflow flag for each case and verify the result using DEBUG. 
(a) (+15) +(-12) (b) 123) + (-127) (c) (+25H) + (+34) 
(d) (-127) + (+127) (e) (+1000) + (—100) 
3. Sign-extend the following and write simple programs in DEBUG to verify them. 
(a)-122  (b)-999H (@) +17H 
(d) 727, “(e)=129 
4. Modify Program 6-2 to find the highest temperature. Verify your program. 


SECTION 6.2: STRING AND TABLE OPERATIONS 


5. Which instructions are used to set and reset the direction flag? State the purpose of 
the direction flag. 

6. The REP instruction can be used with which of the following instructions? 

(a) MOVSB (b) MOVSW (c) CMPSB 
(d) LODSB (e) STOSW (f) SCASW 

7. In Problem 6, state the source and destination operand for each instruction. 

8. Write and verify a program that transfers a block of 200 words of data. 

9. Use instructions LODSx and STOSx to mask the 3 from a set of 50 ASCII digits and 
transfer the result to a different memory location. This involves converting from 
ASCII to unpacked BCD, then storing it at a different location; for example, 

source destination 
ASCII for '5' 0011 0101 0000 0101 

10. Which prefix is used for the inequality case for CMPS and SCAS instructions? 

11. Write a program that scans the initials "IbM" and replaces the lowercase "b" with 
uppercase "B". 

12. Using the timing chart in Appendix B.2, compare the clock count of the instruction 
XLAT and its equivalent to see which is more efficient. 

13. Write a program using a look-up table and XLAT to retrieve the y value in the equa- 
tion y = x2 + 2x + 5 for x values of 0 to 9. 


E 


CHAPTER 6: SIGNED NUMBERS, STRINGS, AND TABLES 193 


ANSWERS TO REVIEW QUESTIONS 


SECTION 6.1: SIGNED NUMBER ARITHMETIC OPERATIONS 


Oy pee 


TIEN 


D7, D15 

16H = 0001 01102 in 2's complement: 1110 10102 

—128 to +127; -32,768 to +32,767 (decimal) 

An overflow is a carry into the sign bit; a carry is a carry out of the register. _ 
The CBW instruction sign extends the sign bit of a byte into a word; the CWD instruc- 
tion sign extends the sign bit of a word into a doubleword. 

F6H sign-extended into AX = FFF6H 

124C sign-extended into DX AX would be DX = 0000 and AX = 124CH. 
IMUL, IDIV 

SHR shifts each bit right one position and fills the MSB with zero. 

SAR shifts each bit right one position and fills the MSB with the sign bit 

in each; the LSB is shifted into the carry flag. 

(a) JLE will jump if OF is the inverse of SF, or if ZF = 1. 

(b) JG will jump if OF equals SF, or if ZF = 0. 


SECTION 6.2: STRING AND TABLE OPERATIONS 


GN PA gS yee 


194 


SI, DI 

Data, extra 

Direction, 11 or D10 

CLD clears DF to 0; STD sets DF to 1 

When CX = 0 

If CX = 0 or the point at which DATA] and DATA2 are not equal 


CHAPTER 7 


MODULES AND MODULAR 
PROGRAMMING 


OBJECTIVES 
Upon completion of this chapter, you will be able to: 


>> Discuss the advantages of modular programming 

>> Break large programs into modules, code modules and calling programs 

>> Declare names that are defined externally via the EXTRN directive 

>> Link subprograms together into one executable program 

>> Code segment directives to link data, code, or stack segments from 
different modules into one segment 

>> Code programs using the full segment definitions 

>> List the various methods of passing parameters to modules and discuss 
the advantages and disadvantages of each 


>> Code programs passing the parameters via registers, memory, or stack 


195 


In this chapter the concept of modules is presented. In Section 7.1, modules are 
discussed along with rules for writing modules and linking them together. In Sections 7.2 
and 7.3, some very useful modules are given, along with the methods of passing parame- 
ters among various modules. 


SECTION 7.1: WRITING AND LINKING MODULES 


Why modules? 


It is common practice in writing software packages to break down the project into 
small modules and distribute the task of writing those modules among several program- 
mers. This not only makes the project more manageable but also has other advantages, 
such as: 


Each module can be written, debugged, and tested individually. 

The failure of one module does not stop the entire project. 

The task of locating and isolating any problem is easier and less time consuming. 
One can use the modules to link with high-level languages such as C/C++, C#, or 
Visual Basic. 

5. Parallel development shortens considerably the time required to complete a project. 


AYN 


In this section we explain how to write and link modules to create a single exe- 
cutable program. 


Writing modules 


In previous chapters, a main procedure was written that called many other subrou- 
tines. In those examples, if one subroutine did not work properly, the entire program 
would have to be rewritten and reassembled. A more efficient way to develop software is 
to treat each subroutine as a separate program (or module) with a separate filename. Then 
each one can be assembled and tested. After testing each program and making sure that 
each works, they can all be brought together (linked) to make a single program. To enable 
these modules to be linked together, certain Assembly language directives must be used. 
Among these directives, the two most widely used are EXTRN (external) and PUBLIC. 
Each is discussed below. 


EXTRN directive 


The EXTRN directive is used to notify the assembler and linker that certain 
names and variables which are not defined in the present module are defined externally 
somewhere else. In the absence of the EXTRN directive, the assembler would show an 
error since it cannot find where the names are defined. The EXTRN directive has the fol- 
lowing format: 


EXTRN namel:type ;each name in a separate EXTRN 
EXTRN name2:type ON 
EXTRN namel:type,name2:type ¡many listed in the same EXTRN 


External procedure names can be NEAR, FAR, or PROC (which will be NEAR 
for small models or FAR for larger models). The following are the types for data names, 
with the number of bytes indicated in parentheses: BYTE (1), WORD (2), DWORD (4), 
FWORD (6), QWORD (8), or TBYTE (10). 


PUBLIC directive 


Those names or parameters defined as EXTRN (indicating that they are defined 
outside the present module) must be defined as PUBLIC in the module where they are 
defined. Defining a name as PUBLIC allows the assembler and linker to match it with its 


196 


EXTRN counterpart(s). The following is the format for the PUBLIC directive: 
PUBLIC namel ;each name can be in a separate directive 
PUBLIC name2 


PUBLIC namel, name2 ;or many can be listed in the same PUBLIC 


Example 7-1 should help to clarify these concepts. It demonstrates that for every 
EXTRN definition there is a PUBLIC directive defined in another module. In Example 
7-1 the EXTRN and PUBLIC directives are related to the name of a FAR procedure. 


Assume there is a program that constitutes the main routine, and two smaller subroutines named 
SUBPROG] and SUBPROG2. The subprograms are called from the main routine. The follow- 
ing shows the use of the EXTRN and PUBLIC directives: 


Solution: 


;one file will contain the main module: 
EXTRN SUBPROG1:FAR 
EXTRN SUBPROG2:FAR 
-MODEL SMALL 
TCODE 
MAIN PROC FAR 


CALL SUBPROG1 
CALL SUBPROG2 


AH, 4CH 
21H 


MAIN 


aa tS amd im e seeraws aS o Ee =e 
PUBLIC: SUBPROG1 

-MODEL SMALL 

«CODE 
PROC FAR 


SUBPROG1 


SUBPROG1 


a Oe ae ane. in onogther [tle ie sole gee e= SSeS +--+ 
PUBLIC SUBPROG2 
-MODEL SMALL 


ICO DE 
SUBPROG2 PROC FAR 

RET 
SUBPROG2 ENDP 

END 


END directive in modules 


In Example 7-1, notice the entry and exit points of the program. The entry point 
is MAIN and the exit point is "END MAIN". Modules that are called by the main mod- 
ule have the END directive with no label or name after it. Notice that SUBPROGI and 
SUBPROG2 each have the END directive with no labels after them. 


errr r eee reer eee eee eee ee 


CHAPTER 7: MODULES AND MODULAR PROGRAMMING 197 


Linking modules together into one executable unit 


Assuming that each program module in Example 7-1 is assembled separately and 
saved under the filenames EXAMPLE1.OBJ, PROC1.OBJ, and PROC2.OBJ, the follow- 
ing shows how to link them together in MASM in order to generate a single executable 
file: 


C> CENK EXAMPLE1.OBJ + PROC1.OBJ + PROC2.OBJ 


Program 7-1 shows how the EXTRN and PUBLIC directives can also be applied 
to data variables. In Program 7-1, the main module contains a data segment and a stack 
segment, but the subroutine modules do not. Each module can have its own data and stack 
segment. While it is entirely permissible and possible that the modules have their own data 
segments if they need them, generally there is only one stack that is defined in the main 
program and it must be defined so that it is combined with the system stack. Later in this 
chapter we show how to combine many segments of different modules to generate one, 
uniform segment for each segment of code, data, and stack. 


Use the program shells in Example 7-1 to: 
1. Add two words. 

2. Multiply two words. 

Each one should be performed by a separate module. The data is defined in the main module, 
and the add and multiply modules have no data segment of their own. 


TITLE PROG7-1MM DEMONSTRATES MODULAR PROGRAMMING 
DAGI GO 1s 

EXTRN SUBPROG1:FAR 

EXTRN SUBPROG2:FAR 

PUBLIC VALUE1, VALUE2, SUM, PRODUCT 
-MODEL SMALL 
-STACK 64 
» DATA 
DW 2050 


AX, @DATA 


MOV DS, AX 

CALL SUBPROG1 7CALL SUBPROG TO ADD VALUE1 + VALUE2 
CALL SUBPROG2 ;CALL SUBPROG TO MUL VALUE1 * VALUE2 
MOV AH, 4CH 

INT 21H ;GO BACK TO OS 


MAIN 


Program 7-1: Main Module 


Analysis of Program 7-1 


Notice in the main module that each of the two subroutines was declared with the 
EXTRN directive, indicating that these procedures would be defined in another file. The 
external subroutines were defined as FAR in this case. In the files where each subroutine 
is defined, it is declared as PUBLIC, so that other programs can call it. In the main mod- 
ule, the names VALUE, VALUE2, SUM, and PRODUCT were defined as PUBLIC, so 
that other programs could access these data items. In the subprograms, these data items 


were declared as EXTRN. These three programs would be linked together as follows: 
C>LINK PROG7-1MM.OBJ + PROG7-1M2 + PROG7-1M3 


——_—_—_—nS——— eeeeeseSFs 
198 


The linker program resolves external references by matching PUBLIC and 
EXTRN names. The linker program will search through the files specified in the LINK 
command for the external subroutines. Notice that the filenames are unrelated to the pro- 
cedure names. "MAIN" is contained in file "PROG7-1MM.OBJ". 


; THIS PROGRAM FINDS THE SUM OF TWO EXTERNALLY DEFINED WORDS 
7AND STORES THE SUM IN A LOCATION DEFINED BY THE CALLING MODULE 


TITLE PROG7-1M2 PROGRAM TO ADD TWO WORDS 
BAGE 60,132 
EXTRN VALUE1:WORD 
EXTRN VALUE2:WORD 
EXTRN SUM:WORD 
PUBLIC SUBPROG1 
-MODEL SMALL 


"CODE 
SUBPROG1 BROC TEAR 
SUB BX, BX 7INITIALIZE CARRY COUNT 


MOV AX,VALUE1 
MOV  DX,VALUE2 


ADD AX, DX 7ADD VALUE1 + VALUE2 
ADC BX, 00 ; ACCUMULATE CARRY 
MOV SUM, AX 7 STORE SUM 
MOV SUM+2, BX 7 STORE CARRY 
RET 

SUBPROG1 ENDP 


END 


Program 7-1: Module 2 


: IHES PROGRAM FINDS THE PRODUCT OF TWO EXTERNALLY DEFINED WORDS 
AND STORES THE PRODUCT IN A LOCATION DEFINED BY THE CALLING MODULE 


TIREE PROG7-1M3 PROGRAM TO MULTIPLY TWO WORDS 
PAGE HO, sz 
EXTRN VALUE1:WORD 
EXTRN VALUE2:WORD 
EXTRN PRODUCT :WORD 
PUBLIC SUBPROG2 
- MODEL SMALL 
-CODE 
SUBPROG2 PROC FAR 
MOV AX, VALUE1L 
MOV CX, VALUE2 


MUL CX ;MULTIPLY VALUE1 * VALUEZ2 
MOV PRODUCT,AX ;STORE PRODUCT 
MOV PRODUCT+2, DX 7;STORE PRODUCT HIGH WORD 
RET 
SUBPROG2 ENDP 


END 


Program 7-1: Module 3 


Example 7-2 shows the shell of modular programs using the simplified segment 
definition. Modular programming with full segment definition is defined later in this sec- 
tion. Notice that in the main module of Example 7-2, the name MAIN has a colon after it 
and is used for the first executable instruction. This is the entry point of the program. The 
exit point of the program is indicated by the same label, which must be named in the END 
directive. No program can have more than one entry and one exit point. The label MAIN 
was chosen in this instance, but of course any name could have been chosen. Remember 
that the END directives in other modules do not have a label after the word "END". 


E 
CHAPTER 7: MODULES AND MODULAR PROGRAMMING 199 


Create a shell for modular programming using the simplified segment definition. 


Solution: 


Modular program shells for the simplified segment directives are as follows. 
The main file will contain: 


-MODEL SMALL 
-STACK 64 
. DATA 


- CODE 
EXTRN SUBPROG1:NEAR 
EXTRN SUBPROG2:NEAR 
MAIN: MOV AX, @DATA ;this is the program entry point 
MOV DS, AX 
CALL SUBPROG1 
CALL SUBPROG2 
MOV AH, 4CH 
INT 2H 
END MAIN ¿this is the program exit point 


ae aT ee and in a separate file: ---------------~~------~- 
-MODEL SMALL 

- CODE 

PUBLIC SUBPROG1 

PROG 


SUBPROG1 


SUBPROG1 


a e and in another file: ------------------------—-— 
-MODEL SMALL 


. CODE 

PUBLIC SUBPROG2 
SUBPROG2 PROC 

RET 
SUBPROG2 ENDP 


END 


Program 7-2 is the same as Program 7-1, rewritten for the full segment definition. 
Compare the two programs to see the ease of the simplified segment definition. When 
using the simplified segment definition shown in Example 7-2, procedures will default to 
NEAR for small or compact models and to FAR for medium, large, or huge models. 


Modular programming and full segment definition 


Program 7-2 uses full segment definition to redefine all the segments of Program 
7-1. An analysis of how the segments are combined as shown in the link map follows the 
program. The code segments were not made PUBLIC in this example. Notice that in order 
to combine various segments from different modules into one segment, the segment 
names must be the same. 


————_—_—. eee 
200 


TITLE PROG7-2MM PROG7-1 REWRITTEN WITH FULL SEGMENT DEFINITION 
BACHE COTIERA 


EXTRN SUBPROG1:FAR 
EXTRN SUBPROG2:FAR 
PUBLIC VALUE1, VALUE2, SUM, PRODUCT 


STSEG SEGMENT PARA STACK 'STACK'! 


DB LOC eer (7 ) 
STSEG ENDS 
DESEG SEGMENT PARA 'DATA' 
VALUEL DW 2050 
VALUE2 DW 500 
SUM Bw 2° DUP (29) 
PRODUCT DW 2 DUP (?) 
DTSEG ENDS 
CODSG A SEGMENT PARA 'CODE' 


MAIN PROC FAR 
BOSUME CS:CODSG A, DS: DISEG, SS:STSEG 
MOV AX, DTSEG 
MOV DS, AX 


CALL SUBPROG1 ;CALL SUBPROG TO ADD VALUE1 + VALUE2 
CALL SUBPROG2 ;CALL SUBPROG TO MUL VALUE1 * VALUE2 
MOV AH, 4CH 
INT 21H ;GO BACK TO OS 

MAIN ENDP 

CODSG A ENDS 
END MAIN 


Program 7-2: Main Module 


; THIS PROGRAM FINDS THE SUM OF TWO EXTERNALLY DEFINED WORDS 
;AND STORES THE SUM IN A LOCATION DEFINED BY THE CALLING MODULE 
TITLE PROG7-2M2 PROGRAM TO ADD TWO WORDS 
PAGE 60,132 

EXTRN VALUE1:WORD 

EXTRN VALUE2:WORD 

EXTRN SUM:WORD 


PUBLIC SUBPROG1 
CODSG_B SEGMENT PARA 'CODE' 
SUBPROG1 ERO Ca 
ASSUME CS:CODSG B 


SUB BX,BX ; INITIALIZE CARRY COUNT 
MOV AX, VALUE1 
MOV DX, VALUE2 
ADD BX, DX ;ADD VALUE] + VALUE2 
ADC  BX,00 ; ACCUMULATE CARRY 
MOV SUM,AX ; STORE SUM 
MOV  SUM+2,BX ; STORE CARRY 
RET 

SUBPROG1 ENDP 

CODSG B ENDS 

= END 


Program 7-2: Module 2 


ree ee eee oO 
CHAPTER 7: MODULES AND MODULAR PROGRAMMING 201 


*THIS PROGRAM FINDS THE PRODUCT OF TWO EXTERNALLY DEFINED WORDS 
7AND STORES THE PRODUCT IN A LOCATION DEFINED BY THE CALLING MODULE 
TITLE PROG/7-2M3 PROGRAM TO MULTIPLY TWO WORDS 
ENGI, (0), 1h 2 
EXTRN VALUE1:WORD 
EXTRN VALUE2 : WORD 
EXTRN PRODUCT:WORD 
PUBLIC SUBPROG2 
CODSG C SEGMENT PARA 'CODE' 
SUBPROG2 PROC FAR 
ASSUME CS:CODSG C 
MOV AX, VALUE1 
MOV CX, VALUE2 


MUL Cx ¿MUL VALUE] * VALUE2 
MOV PRODUCT, AX ; STORE PRODUCT 
MOV ERO DUCTH2PDX ; STORE PRODUCT HIGH WORD 
RET 
SUBPROG2 ENDP 
CODSG C ENDS 


END 
Start Stop Length Name Class 
00000H 00063H 00064H STSEG STACK 


00070H 0007BH 0000CH DTSEG DATA 

00080H 00092H 00013H CODSG A CODE 
OOOAOH OOOB5H 00016H CODSG_B CODE 
OOOCOH OOODOH 00011H CODSG C CODE 


Program 7-2: Module 3 and the Link Map 
Analysis of Program 7-2 link map 


The link map shows the start and end of each segment. Notice that each segment 
starts at a 16-byte boundary: 00070H, 00080H, etc. The code segment for the main mod- 
ule has the name "CODSG_A", starts at 00080H, and ends at 00092H, taking a total of 
00013H bytes. It was classified as 'CODE'. The next code segment is defined under the 
name "CODSG_B". Notice that it starts at the 16-byte boundary 000A0H since it was 
defined as PARA. This means that from 00093H to 0009FH is unused. Similarly, the third 
module starts at 000C0H. Notice that each code segment is separate. They can all be 
merged together into one segment by using the PUBLIC option. This is shown in 
Example 7-3. To merge the code segments together, each code segment must have the 
same name and be declared PUBLIC. 


Example 7-3 


Show the link map for Program 7-2 rewritten to combine code segments (use PARA boundaries) using 
directive: CDSEG SEGMENT PARA PUBLIC. “CODE! 


Solution: 

Start Stop Length Name Class 
00000H 00063H 00064H STSEG STACK 
00070H 0007BH 0000CH DTSEG DATA 
00080H 000D0H 00051H CDSEG CODE 


The following are the SEGMENT directives using word boundaries: 
STSEG SEGMENT WORD STACK "STACK! 
DTSEG SEGMENT WORD 'DATA' 


CDSEG SEGMENT WORD PUBLIC 'CODE! 
The following is the link map when the program used WORD boundaries: 
Start Stop Length Name Class 
00000H 00063H 00064H STSEG STACK 
00064H 0006FH 0000CH DTSEG DATA 
00070H 000AAH 0003BH CDSEG CODE 


SEGMENT directive 


In previous chapters, when a segment was defined using full segment definition, 
no other attributes were mentioned after it. It was simply written 


name SEGMENT 


This kind of definition of segments was acceptable since there was only one of 
each segment of code, data, and stack. However, when there are many modules to be 
linked together, the segment definition must be adjusted. The complete segment definition 
used widely in modular programming is as follows: 


name SEGMENT alignment combine type class name 


Appendix C (see SEGMENT) gives a complete description of the fields of the 
SEGMENT directive. A brief explanation of each field is given below. 

The alignment field indicates whether a segment should start on a byte, word, 
paragraph, or page boundary. For example, if WORD is given in the alignment field, the 
segment will start at the next available word. When the WORD boundary is used, if a pre- 
vious segment ended at offset 0048H, the next segment will start at 004AH. The default 
alignment is PARA, meaning that each segment will start on a paragraph boundary. A 
paragraph in OS is defined as 16 bytes; therefore, each segment will start on a 16-byte 
boundary. When PARA is used, if the previous segment ended at 0048H, the next segment 
would begin at the next paragraph boundary, which is 0050H. Paragraph boundaries end 
in 0; they are evenly divisible by 16 (10H). 

The combine type field indicates to the linker whether segments of the same type 
should be linked together. Typical options for combine type are STACK or PUBLIC. An 
example below shows how to use this field in the stack segment definition to combine the 
stack segment of a program with the system stack to eliminate the "Warning: no stack seg- 
ment" message generated by the linker. If the combine type is PUBLIC, the linker will 
combine that segment with other segments of the same type in other modules. This can be 
used to combine code segments with various names under a single name. 

The class name field has four options: 'CODE', ‘STACK’, 'DATA', and 'EXTRA'. 
It must be enclosed in single quotes. It is used in combining segments of the same type 
from various modules. 


Complete stack segment definition 
The following stack segment definition in the main module will eliminate the 
"Warning: no stack segment" message generated by the linker: 
name SEGMENT PARA STACK 'STACK' 
Complete data and code segment definitions 


The following is a data segment definition that can be used if no other module has 
defined any data segment: 


name SEGMENT PARA 'DATA' 


If any other module has defined a data segment then PUBLIC should be placed 
between PARA and 'DATA'. The following are the code and data segment definitions to 
combine segments from different modules: 


name SEGMENT PARA PUBLIC 'CODE' 
name SEGMENT PARA PUBLIC '‘'DATA' 


Example 7-4 rewrites Example 7-2 to define segments using the complete seg- 
ment definition. 


—z——=—  ___.. aa, 
CHAPTER 7: MODULES AND MODULAR PROGRAMMING 203 


Example 7-4 


Create a shell for modular programming using the complete segment definition. 


Solution: 
The main file will contain: 
AU ARIEL PROG PROGRAM SHELL WITH COMPLETE SEGMENT DEFINITION 
PAGE 60,132 

EXTRN SUBPROG1:FAR 

EXTRN SUBPROG2:FAR 

PUBIMIC T. ¿declare data here to be shared 
SEGMENT PARA STACK 'STACK' 
100 DUP (?) 


SEGMENT PARA 'DATA' 
;define data here 


CODSG_A SEGMENT PARA 'CODE' 

MAIN PROC FAR 

ASSUME CS:CODSG_A,DS:DTSEG, SS:STSEG 
MOV AX, DTSEG 

MOV DS, AX 

CALL SUBPROG1 ,CALL SUBPROG 
CALL SUBPROG2 7CALL SUBPROG 


MOV AH, 4CH 

INT ole #GO BACK TO OS 
MAIN ENDP 
CODSG A ENDS 


MAIN 


pm and in another file: ---===-==--+=4---~--~._____.___ 
TITLE SUBPROG1 PROGRAM 
PAGE® 60,132 
OREN ek ;declare data that is defined externally 
PUBLIC SUBPROG1 ;declare procedures that are called externally 
CODSG_B SEGMENT PARA 'CODE' 
SUBPROG1 PROC FAR 
ASSUME CS:CODSG B 
; the instructions that perform the work of the subroutin 
RET 


e go here 


SUBPROG1 
CODSG B 


aoe a —— and in another file: ---------~----~~~~~~___._____ 
TITER SUBPROG2 PROGRAM TO 
PAGE 60732 


EOONRN se ;declare data that is defined externally 
PUBLIC SUBROG2 ;declare procedures that are called externally 
CODSG_C SEGMENT PARA 'CODE' 
SUBPROG2 PROC FAR 


ASSUME CS:CODSG C 
; the instructions that perform the work of the subroutine go here 


RET 
SUBPROG2 ENDP 
CODSG C ENDS 


END 


—————ssSsSsSsS9—”M0 I 
204 


Review Questions 


1. List three advantages of modular programming. 


The directive is used within a module to indicate that the named variable can 
be used by another module. 
3. The directive is used within a module to indicate that the named variable was 


defined in another module. 

4. How does the system determine the entry and exit points of a program consisting of 
more than one module? 

5. What is a paragraph? 

6. Write the directive used in complete segment definition that will define the stack seg- 
ment so that it will be combined with the system stack. 

7. Ifa word-sized data item named TOTAL was defined in module 1, code the directive 
to define TOTAL in module 2. 

8. IfPARA were used for the alignment type of a code segment that ended at 56H, where 
would the next code segment begin? 

9. Write the code segment directives for a calling program and a module so that they will 
be combined into one code segment. 


SECTION 7.2: SOME VERY USEFUL MODULES 


This section shows the development of two very useful programs that convert 
from hex to decimal, and vice versa. Then they are rewritten as modules that can be called 
from any program. Finally, the calling program is written. 


Binary (hex)-to-ASCIlI (decimal) conversion 


The result of arithmetic operations is, of course, in binary. To display the result 
in decimal, the number is first converted to decimal, and then each digit is tagged with 
30H to put it in ASCII form so that it can be displayed or printed. The first step is to con- 
vert the binary number to decimal. Look at the following example, which converts 34DH 


to decimal. 
34DH = (3 x 162) + (4 x 161) + (D = 13 x 169) 
= (3 x 256) w A g E) eae x 61) 
= 768 + 64 TS 
= 845 


Another method to convert a hex number to decimal is to divide it repeatedly by 
10 (0AH), storing each remainder, until the quotient is less than 10. The following steps 
would be performed: 


34DH / A = 84 remainder 5 

84H / A=8 remainder 4 

6 (<A, so the process stops) 

Taking the remainders in reverse order gives: 845 decimal 


Program 7-3 shows the conversion process for a word-sized (16-bit) number 
using the method of repeated division demonstrated above. Since a word-sized hex num- 
ber is between 0 and FFFFH, the result in decimal can be as high as 65,535. Therefore, a 
string length of 5 should be sufficient to hold the result. The binary number to be con- 
verted is in data item BINNUM. Notice in Program 7-3 that as each decimal digit (the 
remainder) is placed in DL, it is tagged with 30H to convert it to ASCII. It is then placed 
in a memory area called ASCNUM. The ASCII digits are placed in memory so that the 
lowest digit is in high memory, as is the convention of ASCII storage in Windows OS. 


ee ener eee eee —————————————————————————————= 
CHAPTER 7: MODULES AND MODULAR PROGRAMMING 205 


TERRE PROG7-3 CONVERT BINARY TO ASCII 
7;USING SIMPLIFIED SEGMENT DEFINITION 
PAGE CU, LS 


AX, @DATA 
DS, AX 
Isp, ILO 7BX=10 THE DIVISOR 
SI,OFFSET ASCNUM ;SI = BEGINNING OF ASCII STRING 
Sul, 5 ;ADD LENGTH OF STRING 
si ?SI POUNTS TO LAS ASCI® Drerr 
AX, BINNUM ; LOAD BINARY (HEX) NUMBER 
;DX MUST BE 0 IN WORD DIVISION 
;DIVIDE HEX NUMBER BY 10 (BX=10) 
,LAG *3° “TO MARE IT ASCTI 
;MOVE THE ASCII DIGIT 
; DECREMENT POINTER 
; CONTINUE LOOPING WHILE AX >0 


,GO BACK TO OS 


B2ASC_CON 


Program 7-3 
ASCII (decimal)-to-binary (hex) conversion 


When a user keys in digits 0 to 9, the keyboard provides the ASCII version of the 
digits to the computer. For example, when the key marked 9 is pressed, in reality the key- 
board provides its ASCII version 00111001 (39H) to the system. In Chapter 3 we showed 
how in some cases, such as addition, the numbers can be processed in ASCII and there is 
no need to convert them to hex (binary). However, in the majority of cases the number 
needs to be converted to hex in order to be processed by the CPU. Look at the example of 
converting decimal 482 to hex. The following shows the steps to convert the number to 
hex: 


482 / 164 = 482 / 256 = 1 

482 - (1 x 256) 226 226 / 161 = 226 A S A S a 
2A = GA x IE eee 

482 decimal = 1E2 hexadecimal 


However, a computer would use a different method since it works in binary arith- 
metic, not decimal. First the 30H would be masked off each ASCII digit. Then each digit 
is multiplied by a weight (a power of 10) such as 1, 10, 100, or 1000 and they are then 
added together to get the final hex (binary) result. Converting decimal 482 to hex 
involves the following steps. First a user types in '482' through the PC ASCII keyboard, 
yielding 343832, the ASCII version of 482. Then the following steps are performed: 


Se OL = = 2 
Bo Ie = 80 = 50H 
4 x 100 = 400 = 190H 


IEZ hexadecimal 


ee ŘŮĖ— 
206 


PROG7-4 CONVERT ASCII TO BINARY 
PAGE 607137 

- MODEL SMALL 

-STACK 64 


ASC2B CON PROC FAR 
MOV AX, @DATA 
MOV DS, AX 


SUB DEDIT 7;CLEAR DI FOR BINARY (HEX) RESULT 
MOV SI,OFFSET ASCNUM ;SI = BEGINNING OF ASCII STRING 
MOV BL, STRLEN 7BL = LENGTH OF ASCII STRING 
SUB BH, BH ;BH=0 USE BX IN BASED INDEX MODE 
DEC BX T BXTTS ORE SET lO eins he DiliGi 
MOV Oe, I ;CX = WEIGHT FACTOR 
AGAIN: MOV AL,[ SI+BX] AEM Weis, ANSI IRIE DIGE 
AND AL, OFH AISIURLe’ Ole Vai! 
SUB AH, AH ;CLEAR AH FOR WORD MULTIPLICATION 
MUL CX MULTIPLY BY See aie iat 
ADD DI, AX ;ADD IT TO BINARY (HEX ) RESULT 
MOV AX, CX *MULTIPLY THE WEIGHT FACTOR 
MUL TEN aE 
MOV CX, AX ; FOR NEXT ITERATION 
DEC BX ; DECREMENT DIGIT POINTER 
JNS AGAIN ; JUMP IF COUNTER >= 0 
MOV BINNUM, DI ;SAVE THE BINARY (HEX) RESULT 
MOV AH, 4CH 
INT 2H TCO BACKELOROS 


ASC2B CON ENDP 
END ASC2B_CON 


Program 7-4 


Program 7-4 converts an ASCII number to binary. It assumes the maximum size 
of the decimal number to be 65535. Therefore, the maximum hex result is FFFFH, a 16- 
bit word. It begins with the least significant digit, masks off the 3, and multiplies it by its 
weight factor. Register CX holds the weight, which is 1 for the least significant digit. For 
the next digit CX becomes 10 (OAH), for the next it becomes 100 (64H), and so on. The 
program assumes that the least significant ASCII digit is in the highest memory location 
of the data. This is consistent with the conventions of storing ASCII numbers with the 
most significant digit in the lower memory address and the least significant digit in the 
highest memory address. For example, placing '749' at memory offset 200 gives offset 200 
= (37), 201 = (34), and 202 = (39). OS 21H function call 0A also places ASCII numbers 
this way. 

Programs 7-3 and 7-4 have been written and tested with sample data, and now 
can be changed from programs into modules that can be called by any program. 


Binary-to-ASCIl module 


Program 7-6 is the modularized Program 7-3. The procedure is declared as pub- 
lic, so it can be called by another program. All values used are declared external since the 
data will be provided by the calling program. Therefore, this module does not need its own 
data segment. Notice the following points about the module: 


eee ener een nn ————e 
CHAPTER 7: MODULES AND MODULAR PROGRAMMING 207 


TITLE PROG7-5 BINARY TO DECIMAL CONVERSION MODULE 
PAGE 60,132 

;this module converts a binary (hex) number up to FFFFH to decimal 
; then makes it displayable (ASCIT) 

; CALLING PROGRAM SETS 

; AX = BINARY VALUE TO BE CONVERTED TO ASCIT 


; SI = OFFSET ADDRESS WHERE ASCII VALUE IS TO BE STORED 
-MODEL SMALL 
PUBLIC B2ASC_ CON 
. CODE 

B2ASC CON PROC FAR 

~ PUSHF ; STORE REGS CHANGED BY THIS MODULE 

PUSH BX 
PUSH DX 
MOV BX,10 ;BX=10 THE DIVISOR 
ADD SI,4 7Sl POINTS. 20 TAST C emia: 

B2A LOOP: SUB DX,DX ;DX MUST BE 0 IN WORD DIVISION 

Be DIV BX ;DIVIDE HEX NUMBER Bi 10 (BX=10) 

OR  ODL,30H ;TAG '3' TO REMAINDER TO MAKE IT ASCII 
MOV | Sil , DL ;MOVE THE ASCII DIGIT 
DEC SI ;DECREMENT POINTER 


; CONTINUE LOOPING WHILE . X >0 


7 RESTORE REGISTERS 


Program 7-5 
PROG7-6 ASCII TO BINARY CONVERSION MODULE 


PAGE 60,132 
;this module converts any ASCII number between 0 £0. 65535 to ame, 
; CALLING PROGRAM SETS 
; SI = OFFSET OF ASCII STRING 
; BX = STRING LENGTH - 1 (USED AS INDEX INTO ASCII NUMBER) 
; THIS MODULE SETS 
; AX = BINARY NUMBER 
-MODEL SMALL 
EXTRN TEN:WORD 
PUBLIC ASC2B CON 


ACODE 
ASC2B_CON PROC FAR 
PUSHF ; STORE REGS CHANGED IN THIS MODULE 
PUSH DI 
PUSH. CX 
SUB Di, DI 7;CLEAR DI FOR THE BINARY (HEX) RESULT 
MOV exe 7CX = WEIGHT FACTOR 
A2B LOOP: MOV Aly, | SI+BX] CEN TRE TASCHE DICLE 
AND AL, OFH Foi ORE wise 
SUB AH, AH CLEAR AH FOR WORD MULTIPLICATION 
MUL CX ; MULTIPLY BY THE WEIGHT 
ADD DI,AX ;ADD IT TO BINARY (HEX) RESULT 
MOV AX, CX *;MULTIPLY THE WEIGHT FACTOR 
MUL TEN ; BY TEN 
MOV CX, AX 7 FOR NEXT ITERATION 
DEC BX ; DECREMENT DIGIT POINTER 
JNS A2B_ LOOP ; JUMP IF OFFSET >=0 
MOV AX, DI ; STORE BINARY NUMBER IN AX 
POR CX RESTORE FLAGS 
BOJF DI 
PORT 
RET 
ASC2B_CON ENDP 


END 


Program 7-6 


a ees 
208 


PROG7-7 CALLING PROGRAM TO CONVERT ASCII TO BINARY 


PAGE 60), 132 
PUBLIC TEN 
-MODEL SMALL 
64 


LABEL BYTE 
MAX LEN DB 6 
ACT LEN DB ? 
ASC NUM DB 6 DUP (?) 


ORG 10H 

BINNUM DW 0 

PROMPT1 DB ‘PLEASE ENTER A 5 DIGIT NUMBER','S' 
DW 10 


ASC2B CON:FAR 
MAIN PROC FAR ~ 

MOV AX, @DATA 

MOV De Ax 


MOV  AH,09 ;DISPLAY THE PROMPT 
MOV DX,OFFSET PROMPT1 

INT 21H 

MOV AH, OAH ; INPUT STRING 

MOV DX,OFFSET ASC AREA 

INT 21H E 


MOV SI,OFFSET ASC NUM 
MOV BH, 00 
MOV BL,ACT LEN 


DEC BX 

CALL ASC2B_CON 

MOV BINNUM,AX ;SAVE THE BINARY (HEX) RESULT 
MOV AH, 4CH 

INT 21H ;GO BACK TO OS 


MAIN 


Program 7-7 


1. Since this module will be called by another module, no entry point and exit point were 
given. Therefore, the END directive does not have the label BZASC_CON. 

2. The module must return to the caller and not OS as was the case in Program 7-4. 

3. This module does not need its own data or stack segments. 


ASCIll-to-binary module 


Program 7-6 is the modularized version of Program 7-4. Notice the following 
points about the module: 


1. TEN is defined in the calling program. 
2. This module must return to the caller and not OS. 


Calling module 


Program 7-7 shows the calling program for the module that converts ASCII to 
binary. This program sets up the data segment, inputs the ASCII data from the keyboard, 
places it in memory, then calls the routine to convert the number to binary. Finally, the 
hex result is stored in memory. The flowchart and pseudocode for the main module and 
the subroutine are given next. 


Review Questions 


1. Show a step-by-step analysis of Program 7-3 with data F624H. Show the sequence 
of instructions and the data values. 

2. Show a step-by-step analysis of Program 7-4 with data '1456'. Show the sequence of 
instructions and the data values. 


eee ee erence nnn VS 
CHAPTER 7: MODULES AND MODULAR PROGRAMMING 209 


2 


1 


Flowchart and Pseudocode for Program 7-7 


0 


Prompt for and 


Spoint <- string start 
Dpoint <- string end 


Main Module 


Prompt for string 
Input string 
Spoint <- points to start of string 
Dpoint <- points to end of string 
CALL precedure to convert ASCII to binary 
Send: Spoint, Dpoint 
Return: binary result 
Store binary result 


Procedure 
to convert ASCII to binary 


Receive: 
Spoint (points to start of string) 


Dpoint (points to end of string) 


Result = 0 
Weight = 1 
REPEAT 


Get Digit pointed to by Dpoint 
Strip off ASCII ‘3’ 
Multiply Digit by Weight 
Add Digit to Result 
Multiply Weight by 10 
Decrement Dpoint 
UNTIL (all bytes converted) 


Return: 
Binary result 


SECTION 7.3: PASSING PARAMETERS AMONG MODULES 


Occasionally, there is a need to pass parameters among different Assembly lan- 
guage modules or between Assembly language and C/C++, C#, and Visual Basic language 
programs. The parameter could be fixed values, variables, arrays of data, or even pointers 
to memory. Parameters can be passed from one module to another through registers, mem- 
ory, or the stack. In this section we explore passing parameters between Assembly lan- 
guage modules. 


Passing parameters via registers 


When there is a need to pass parameters among various modules, one could use 
the CPU's registers. For example, if a main routine is calling a subroutine, the values are 
placed in the registers in the main routine and then the subroutine is called upon to process 
the data. In such cases the programmer must clearly document the registers used for the 
incoming data and the registers that are expected to have the result after the execution of 
the subroutine. In Chapter 4 this concept was demonstrated with INT 21H and INT 10H. 
Program 7-7 demonstrated this method. In that program, registers BX and SI were set to 
point to certain data items before the module was called, and the called module placed its 
result in register AX prior to returning to the calling routine. 


Passing parameters via memory 


Although parameter passing via registers is widely used in many of the OS and 
BIOS interrupt function calls, the limited number of registers inside the CPU is a major 
limitation associated with this method of parameter passing. This makes register manage- 
ment a cumbersome task. One alternative is to pass parameters via memory by defining 
an area of RAM and passing parameters to these RAM locations. OS and IBM BIOS use 
this method frequently. The problem with passing parameters to a fixed area of memory 
is that there must be a universal agreement to the address of the memory area in order to 
make sure that modules can be run on the hardware and software of various companies. 
This kind of standardization is hard to come by. The only reason that BIOS and OS use an 
area of memory for passing parameters is because IBM and Microsoft worked closely 
together to decide on the memory addresses. Another option, and indeed the most widely 
used method of passing parameters, is via the stack, as discussed next. Passing parameters 
via the stack makes the parameters both register and memory independent. 


Passing parameters via the stack 


The stack is a very critical part of every program and playing with it can be risky. 
When a module is called, it is the stack that holds the address where the program must 
return after execution. Therefore, if the contents of the stack are altered, the program can 
crash. This is the reason that working with the stack and passing parameters through it 
must be understood very thoroughly before one embarks on it. 

Program 7-8 demonstrates this method of parameter passing and is written with 
the following requirements. The main module gets three word-sized operands from the 
data segment, stores them on the stack, and then calls the subroutine. The subroutine gets 
the operands from the stack, adds them together, holds the result in a register, and then 
returns control to the main module. The main module stores the result of the addition. 
Following the program is a detailed stack contents analysis that will show how the param- 
eters are stored on the stack by the main routine and retrieved from the stack by the called 
routine. 


Stack contents analysis for Program 7-8 

To clarify the concept of parameter passing through the stack, the following is a 
step-by-step analysis of the stack pointer and stack contents. Assume that the stack point- 
er has the value SP = 17FEH before the "PUSH VALUE3" instruction in the main mod- 
ule is executed. 


aaa 
CHAPTER 7: MODULES AND MODULAR PROGRAMMING 211 


TITLE PROG/-8 PASSING PARAMETERS VIA THE STACK 
EACH S60) is2 

- MODEL SMALL 

EXTRN SUBPROG6:FAR 

. STACK 64 

~ DATA 
VALUE1 DW 3F62H 
VALUE2 DW 1979H 
VALUE3 DW 25F1H 
DW 2 DUP (?) 


FAR 
MOV AX, @DATA 
MOV DS, AX 


PUSH VALUE3 ; SAVE VALUE3 ON STACK 
PUSH VALUE2 ¿SAVE VALUE2 ON STACK 
PUSH VALUE1 ; SAVE VALUE1 ON STACK ` 
CALL SUBPROG6 ;CALL THE ADD ROUTINE 


MOV RESULT, AX ; STORE 

MOV RESULT+2,BX ; THE RESULT 
MOV AH, 4CH 

21H 


MAIN 


ee ee in a separate file: -=--==----=-llIlIII I 
SUBPROG6 MODULE TO ADD THREE WORDS BROUGHT IN FROM THE 


SO Fil32 
-MODEL SMALL 
PUBLIC SUBPROG6 
-CODE 
SUBPROG6 PROC FAR 
SUB BX, BX 7;CLEAR BX FOR CARRIES 
IPOKSis| Bie 7 SAVE BP 
MOV Biel, Sie ,SET BP FOR INDEXING 
MOV INS all EPI TE *MOV VALUE] TO AX 
MOV Cx, BPs ,MOV VALUE2 TO CX 
MOV DX,{ BP] +10 ;MOV VALUE3 TO Dx 
ADD AX, CX 7;ADD VALUE2 TO VALUE1 
ADC BX, 00 ; KEEP THE CARRY IN BX 
ADD AX, DX 7;ADD VALUE3 
ADC BX, 00 *7KEEP THE CARRY IN BX 
POP BẸ ; RESTORE BP BEFORE RETURNING 
RET 6 ; RETURN AND ADD 6 TO SP TO BYPASS DATA 
SUBPROG6 ENDP 
END 


Program 7-8: Module 2 


1. VALUE3 = 25F1H is pushed and SP = 17FC (remember little endian: low byte to low 

address and high byte to high address). 

VALUE2 = 1979H is pushed and then SP = 17FA. 

VALUE! = 3F62H is pushed and then SP = 17F8. 

CALL SUBPROG6 is a FAR call; therefore, both CS and IP are pushed onto the stack, 

making SP = 17F4. If it had been a near call, only IP would have been saved. 

5. In the subprogram module, register BP is saved by PUSHing BP onto the stack, which 
makes SP = 17F2. In the subprogram, BP is used to access values in the stack. First 
SP is copied to BP since only BP can be used in indexing mode with the stack seg- 
ment (SS) register. In other words, "Mov ax,| SP+4] " will cause an error, "Mov 


212 


fe a? 


AX,{ BP] +6" loads VALUE] into AX. [BP] + 6 = 17F2 
+ 6 = 17F8, which is exactly where VALUE! is located. 
Similarly, BP + 8 = 17F2 + 8 = 17FA is the place where 
VALUE2 is located, and BP + 10 = 17F2H + 10 = 17FCH 
is the location of VALUE3. 

6. After all the parameters are brought into the CPU by the 
present module and are processed (in this case added), 
the module restores the original BP contents by 
POPping BP from stack. Then SP = 17F4. 

7. RET 6: This is a new instruction. The RETurns shown 


previously did not have numbers right after them. The 70 VALUED 
"RET n" instruction means first to POP CS:IP (IP only a a 
if the CALL was NEAR) off the top of the stack and a VALUE3 


then add n to the SP. As can be seen from the Program 
7-8 diagram, after popping CS and IP off the stack, the 
stack pointer is incremented four times, making SP = 
17F8. Then adding 6 to it to bypass the six locations of 
the stack where the parameters are stored makes the SP 
= 17FEH, its original value. Now what would happen if 
the program had a RET instruction instead of the "RET 
6"? The problem is that every time this subprogram is executed it will cause the stack 
to lose six locations. If that had been done in the example above, when the same rou- 
tine is called again the stack starts at 17F8 instead of 17FE. If this practice of losing 
some area of the stack continues, eventually the stack could be reduced to a point 
where the program would run out of stack and crash. 


Program 7-8: Stack 
Contents Diagram 


Review Questions 


1. List one advantage and one disadvantage of each method of parameter passing. 
(a) via register (b) via stack (c) via memory 
2. Assume that we would like to access some parameters from the stack. Which of the 
following are correct ways of accessing the stack? 
(a) MOV AX,[BP]+20 (b) MOV AX,[SP]+20 
(c) MOV AX,[BP+D]] (d) MOV AX,[SP+SI] 


PROBLEMS 


SECTION 7.1: WRITING AND LINKING MODULES 


1. Fill in the blanks in the following program. The main program defines the data and 
calls another module to add 5 bytes of data, then saves the result. Note: Some blanks 
may not need anything. 

.MODEL SMALL 


.STACK 100H 
.DATA 
PUBLIC _ ; 
DATAL DB 25) 12,24, 56,98 
RESULT DW ? 
.CODE 
EXTRN :FAR 
HERE: MOV AX, @DATA 
MOV DS, AX 
CALL SUM 
MOV AH, 4CH 
INT 21H 
END 


a 
CHAPTER 7: MODULES AND MODULAR PROGRAMMING 213 


In another file there is the module for summing 5 bytes of data: 


-MODEL SMALL 


DATA1 : BYTE 
RESULT : WORD 
COUNT EQU 5 
-CODE 
SUM 
SUMPROC 

MOV BX,OFFSET DATA1 

SUB “AX, AX 

MOV CX, COUNT 

ADD AL, BYTE PTR [BX] 

ADC MANO 

INC BX 

LOOP AGAIN 

MOV RESULT,AX 

RET 

ENDP 

END 
2. Ifa label or parameter is not defined in a module, it must be declared as 
3. Ifa label or parameter is used by other modules, it must be declared as in 

the present module. 

4. List the options for the EXTRN directive when it is referring to a procedure. 
5. List the options for the EXTRN directive when it is referring to a data item. 
6. List the options for the PUBLIC directive when it is referring to a procedure. 
7. List the options for the PUBLIC directive when it is referring to a data item. 
8. Convert Program 4-1 to the modular format, making each of the INT subroutines a 


separate module. Each module should be NEAR. Assemble and test the program. 
SECTION 7.2: SOME VERY USEFUL MODULES 


9. Write a program that accepts two unsigned numbers (each less than 999) from the 
keyboard, converts them to hex, takes the average, and displays the result on the mon- 
itor. Use the hex-to-decimal and ASCII-to-hex conversion modules in the text. 


SECTION 7.3: PASSING PARAMETERS AMONG MODULES 


10. Write a program (similar to Program 7-1) with the following components. 

(a) In the main program, two values are defined: 1228 and 52400. 

(b) The main program calls two separate modules, passing the values by stack. 

(c) In the first module, the two numbers are multiplied and the result is passed back 
to the main module. 

(d) The second module performs division of the two numbers (52400 /122) and 
passes both the quotient and remainder back to the main program to be stored. 

(e) Analyze the stack and its contents for each module if SP = FFF8H immediately 
before the first CALL instruction in the main module. 


214 


ANSWERS TO REVIEW QUESTIONS 


SECTION 7.1: WRITING AND LINKING MODULES 


1. (1) Each module can be developed individually, allowing parallel development of 
modules, which shortens development time; (2) easier to locate source of bugs; (3) 
these modules can be linked with high-level languages such as C. 

PUBLIC 

EXTRN 

The module that is the entry and exit point will have a label after the END statement. 
A paragraph consists of 16 bytes and begins on an address ending in 0H. 

name SEGMENT PARA STACK 'STACK' 

EXTRN TOTAL:WORD 

60H 

name SEGMENT PARA PUBLIC 'CODE' 


SOO ON ae a 


SECTION 7.2: SOME VERY USEFUL MODULES 


1. Ist iteration: AX = F624 F624/A = 189D remainder DL = 2 
2nd iteration: AX = 189D 189D/A = 0276 remainder DL = 1 
3rd iteration: AX = 0276 0276/A=003F remainder DL = 0 
4th iteration: AX =003F 003F/A=6 remainder DL = 3 
5th iteration: AX = 0006 0006/A=0 remainder DL = 6 
AX is now zero, so the conversion is complete: F624H = 63,012 

2. Ist iteration: AL = 36 06x 1=6 DI =6 
2nd iteration: AL = 35 05 x A= 32 DI = 6 + 32 = 38 
3rd iteration: AL = 34 04 x 64 = 190 DI = 38 + 1C8 
4th iteration: AL = 31 01 x 03E8 = 03E8 DI = 1C8 + 03E8 = 05B0 
5th iteration: AL = 30 00 x 2710 =0 DI = 05B0 


BX has been decremented from 4 to 0, and is now —1, so the conversion is complete: 
01456 = 0SBOH 


SECTION 7.3: PASSING PARAMETERS AMONG MODULES 


1. (a) By register; one advantage is the execution speed of registers; one disadvantage 

is that there is a limited number of registers available so that not many values can be 
assed. 

(b) By stack; one advantage is it does not use up available registers; one disadvantage 
is that errors in processing the stack can cause the system to crash. 
(c) By memory; one advantage is a large area available to store data; one disadvan- 
tage is that the program would not be portable to other computers. 

2. (a) and (c) are correct; (b) and (d) are not correct because SP cannot be used in index- 
ing mode with SS. 


D 


CHAPTER 7: MODULES AND MODULAR PROGRAMMING 215 


216 


CHAPTER 8 


32-BIT PROGRAMMING 
FOR x86 


OBJECTIVES 


Upon completion of this chapter, you will be able to: 


>> 


>> 


>> 


>> 


>> 


>> 


>> 


>> 


>> 


Discuss the major differences between the 16-bit and 32-bit CPUs 
List the 32-bit registers of the x86 CPU 

Diagram the register sizes available in the 32-bit CPUs 

Explain the difference in register usage between the 16-bit and 

the 32-bit systems 

Discuss how the increased register size of the 32-bit systems relates to an 
increased memory range 

Diagram how the “little endian” storage convention of x86 machines 
stores doubleword-sized operands 

Code programs for the 32-bit CPU using extended registers and new 
directives 

Code arithmetic statements using the extended registers of the 32-bit 
CPUs 

Code Assembly language within C programs by using in-line coding 


217 


All programs discussed so far used 16-bit registers in x86 computers. In this chap- 
ter we discuss the 32-bit registers of x86 microprocessors in addition to combining C and 
Assembly languages. In Section 8.1, we discuss the 32-bit registers, and some program 
examples will be given that use the 32-bit capability of x86 machines. We also show how 
to combine Assembly and C in Visual Studio .NET. 


SECTION 8.1: 32-BIT PROGRAMMING IN x86 


In this section we concentrate on some of the most important differences between 
the 16-bit and 32-bit modes. While in the 8086/286 microprocessors the register size was 
16 bits wide, in the 386 and higher CPUs the size of registers was extended to 32 bits. All 
register names have been changed to reflect this extension. Therefore, AX has become 
EAX, BX is now EBX, and so on, as illustrated below and outlined in Table 8-1. For 
example, the 386 and higher CPUs contain registers AL, AH, AX, and EAX with 8, 8, 16, 
and 32 bits, respectively. In the 16-bit, register AX is accessible either as AL or AH or 
AX, while in the 32-bit, register EAX can be accessed only as AL or AH or AX or EAX. 
In other words, the upper 16 bits of EAX are not accessible as a separate register. The 
same rule applies to EBX, ECX, and EDX. See Figure 8-1. Registers DI, SI, BP, and SP 
have become EDI, ESI, EBP, and ESP, respectively. In addition to the CS, DS, SS, and 
ES segment registers, there are also two new segment registers: FS and GS. There are also 
several control registers: CRO, CR1, CR2, and CR3. See Table 8-1. 


15 8 7 0 


The 32-bit register EAX of 
the 386 and higher CPUs 


Figure 8-1: 16-bit vs. 32-bit Mode 


32-bit general registers as pointers 


Another major change from 16-bit to 32-bit is the ability of general registers such 
as EAX, ECX, and EDX to be used as pointers. As shown in previous chapters, AX, CX, 
and DX could not be used as pointers. For example, an instruction such as "MOV 
CL,[AX]" would have caused an error in the 16-bit since only BX, SI, DI, and BP were 
allowed to be used as pointers to memory. This has changed. Starting with the 386 CPUs, 
the following instructions are perfectly legal: 


MOV AX,| ECX] 
ADD SI,[{ EDX] 
OR EBX,[ EAX] +20 


—s eesesSsSh 


218 


8-1: Registers of the 32-bit x86 by Cates 


Categor Bits Register Names 
Gener 32 EAX, EBX, ECX EDX ~ 
16 AX, BX, CX, DX 
8 ART AL BH. BL-CH. CL DH, DL 
Pointer 32 ESP (extended SP), EBP (extended BP) 
16 SP (stack pointer), BP (base pointer 
Index 32 ESI (extended SI), EDI (extended DI 
16 SI (source index), DI (destination index 
Segment 16 CS (code segment), DS (data segment) 


SS a segment), ES (extra segment) 
FS (extra segment), GS (extra segment 
Instruction 32 EIP (extended instruction pointer) 
Fla 32 EFT (extended flag register 
Control 32 CRO» CRI .CR2, CR3 


Note: Only bit 0 of CRO is available in real mode. All other control registers are available in pro- 
tected mode only. 


It must be noted that when EAX, ECX, or EDX are used as offset addresses, DS 
is the default segment register. That means that SS is the default segment register for ESP 
and EBP, CS for EIP, and DS for all other registers. Table 8-2 summarizes addressing 
modes for 32-bit programming of x86 processors. 


Table 8-2: Addressing Modes for 32-bit Programming 


Addressing Mode Operand Default Segment 
Register register none 
Immediate data none 
Direct [offset] DS 
Register indirect [BX] DS 
[SI] DS 
[DI] DS 
[EAX] DS 
[EBX] DS 
[ECX] DS 
[EDX] DS 
[ESI] DS 
EDI DS 
Based relative [BX]+disp DS 
[BX]+disp SS 
[EAX]+disp DS 
[EBX]+disp DS 
[ECX]+disp DS 
[EDX]+disp DS 
EBP}+dis SS 
Indexed relative [DI]+disp DS 
[SI]+disp DS 
[EDI]+disp DS 
ESI]+dis DS 
Based indexed relative {R1][R2]+disp If BP is used, the segment is SS; 


where RI and R2 are otherwise; DS is the segment 
any of the above 


Note: In based indexed relative addressing, disp is optional. 


Hearne ee ern a ccc. 
CHAPTER 8: 32-BIT PROGRAMMING FOR x86 219 


Accessing 32-bit registers with commonly used assemblers 


In Assembly language the directive ".386" is used to access the 32-bit registers of 
386 and higher CPUs and to employ the new instructions of the 386 microprocessor. 
Every new generation of x86 has some new instructions that do not execute on lower 
processors, meaning that they are upwardly compatible. In other words, using the ".386" 
directive in a program means that the program must be run only on 386 and higher CPUs 
and cannot be run on 8086/286. In contrast, all programs in previous chapters were writ- 
ten to be run on any x86 computer. The following are additional assembler directives, 
which indicate the type of microprocessor supported by Microsoft's assembler (MASM). 


MASM Meaning 
386 will run on any x86 CPU (default) 
- 386 will run on any 386 and higher CPU; also 


allows use of new 386 instructions 


Write a program using the 32-bit registers of the 386 to add the values 100000, 200000, and 
40000. Then subtract from the total sum the values 80000, 35000, and 250. Place the result in 
memory locations allocated using the DD, doubleword directive, used for 32-bit numbers. 


TITLE ADD AND SUBTRACT USING 32-BIT REGISTERS IN 386 MACHINES 
PAGE 60,132 
-MODEL SMALL 
2586 
«STACK 200H 
- DATA 
RESULT DD 2 
. CODE 
BEGIN: MOV AX, @DATA 


MOV DS, AX 
SUB EAX, EAX 

ADD EAX,100000 ;EAX = 186A0H 

ADD EAX,200000 ;EAX 186A0H + 30D40 H = 493E4H 

ADD EAX, 40000 7 EAX 493E4H + 9C40H = 53020H 

SUB EAX, 80000 7 BAX = S3020H - 13880H = 3F7A0H 

SUB EAX, 35000 ; EAX 3F7A0H - 88B8H = 36EE8H 

SUB EAX, 250 EAX 36338H - FAH = 36DEEH (224750) 
MOV RESULT, EAX 

MOV AH, 4CH 

INT 2H 
END BEGIN 


Il 


il 


ol 


Program 8-1 


Program 8-1 demonstrates the use of the ".386" directive and the 80386 32-bit 
instructions. The simplified segment definition was used. The program used the 32-bit 
register EAX to add and subtract values of various size to demonstrate 32-bit program- 
ming of the x86. Now the question is how to run this program and see the register con- 
tents in x86 machines. Unfortunately, the DEBUG utility used in earlier chapters cannot 
be used since it shows only the 16-bit registers. In many assemblers, including MASM, 
there are advanced debugging tools that one can use to see the execution of the 32-bit pro- 
grams. In the case of MASM, the CodeView utility is the tool that allows one to monitor 
the execution of 16-bit in addition to 32-bit programs. 

Following Program 8-1 is shown a trace of the program, using Microsoft's 
CodeView program. To examine the 32-bit registers in CodeView, press F2 to display reg- 
isters; then select the Options menu from the top of the screen, and a drop-down menu 
appears. Select "386" from the drop-down menu to display the registers in 32-bit format. 


eee 


220 


File View Search Run Watch Options Language Calls Help | F8=Trace F5=Go 


4833:0000 B83648 MOV AX,4836 EAX=00036DEE 
4833:0003 8ED8 MOV DS,AX EBX=00000000 
#633:0005 662BC0 SUB EAX, EAX ECX=00000000 


4833:0008 6605A0860100 
4833:000E 6605400D0300 
4833:0014 6605409C0000 
4833:001A 662D80380100 
4833:0020 662DB8880000 
4833:0026 662DFA000000 


ADD 
ADD 
ADD 
SUB 
SUB 
SUB 


EAX, 000186A0 
BAX, 00030D40 
EAX, 00009C40 
EAX, 00013880 
EAX, 000088B8 
EAX, 000000FA 


EDX=00000000 
ESP=00000200 
EBP=00000000 
ESI=00000000 
EDI=00000000 
DS=...4836 


ES=....4823 
File View Search Run Watch Options Language Calls Help | F8=Trace F5=Go 


20030 B44C MOV AH,4C FS=....0000 
> 0082 96021 WNT 21 ‘Gs=....0000 
:0034 0000 ADD Byte Ptr 
:0036 0000 ADD Byte Ptr [ 
0000 ADD Byte Ptr 
0000 ADD Byte Ptr 
0000 ADD Byte Ptr 
0000 ADD Byte Ptr 


Sse ics 
Cesco oo EEE 
IP=0000002C 


NV UP 
7 EI NG 


NEEDS WORK! 


Figure 8-2: CodeView Screen for Execution of Program 8-1 


Little endian revisited 


In analyzing how the x86 stores 32-bit data in memory or loads a 32-bit operand 
into a register, recall the little endian convention: The low byte goes to the low address 
and the high byte to the high address. For example, an instruction such as "MOV 
RESULT,EAX" in Program 8-1 will store the data in this way: 


CBESEy CONTENTS 
RESULT GOTAN 
RESUMIS I deS aS 
RESULT+2 GEAZ 
RESULT+3 d24-d31 


Example 8-1 


Assuming that SI = 1298 and EAX = 41992F56H, show the contents of memory locations after 
the instruction "MOV [ SI] , EAX". 


Solution (in hex): 


DS:1298 = (56) 
1S:1299 = E) 
DS:129A = (99) 
DS:129B = (41) 


| el 


CHAPTER 8: 32-BIT PROGRAMMING FOR x86 221 


Some mare examples of 32-bit programming 


One way to increase the processing power of the microprocessor is to widen the 
register size. This allows processing large numbers as a whole rather than breaking them 
into smaller chunks to fit into small registers. The 32-bit registers have become standard 
in all recent microprocessors. Powerful supercomputers use 64-bit registers. In this sec- 
tion we show revisions of some earlier programs using the 32-bit capability of the x86 
machines to see the impact of the wider registers in programming. By comparing 32-bit 
versions of these programs with the 16-bit versions, one can see the increased efficiency 
of 32-bit coding. The impact on speed is discussed in the final section. 


Adding 16-bit words using 32-bit registers 


Program 3-1b used 16-bit registers for adding several words of data. The sum was 
accumulated in one register and another register was used to add up the carries. This is 
not necessary when using 32-bit registers. First, refresh your memory by looking at 
Program 3-1b and then examine Program 8-2, a 32-bit version of the same program, writs 
ten for 386 and higher CPUs. 


REVISION OF PROGRAM 3-1B USING 32-BIT REGISTERS 


CAERE 
PAGE 60/132 
-MODEL SMALL 


- 386 
.STACK 2008 
. DATA 
DATA1 DD 2o28 5217295337 30105, 32375 
SUM DD ? 
COUNT EQU 5 
. CODE 
BEGIN: MOV AX, @DATA 
MGV DS, AX 
MOV (Ox, COMUNE 7CX Is Poop counter 
MOV SI OFFSET DATA1 rol is data pointer 
SUB EAX, EAX 7;EAX will hold sum 
BACK: ADD EAX, DWORD PTR SI] ;add next word to EAX 
ADD S174 7SI points to next dword 
DEC CX ;decrement loop counter 
JNZ BACK ;continue adding 
MOV SUM, EAX ,;store sum 
MOV AH, 4CH 
INT Ze AUSSI 


END 


Program 8-2 


Adding multiword data in 32-bit 


In Program 3-2, two multiword numbers were added using 16-bit registers. Each 
number could be as large as 8 bytes wide. That program required a total of four iterations. 
Using the 32-bit registers of the 386/46 requires only two iterations, as shown in Program 
8-3a. This loop version of the multiword addition program is very long and inefficient. It 
can be made more efficient by saving the flag register that holds the carry bit of the first 
32-bit addition on the stack and then adding 4 to each pointer instead of incrementing the 
pointers four times. The loop is shown below in Program 8-3b. 

Due to the high penalty associated with branch instructions such as the LOOP and 
Jcondition instructions in the 386 and higher CPUs, it is better to use the nonloop version 
of this program, shown in Program 8-4. 

First notice that the data is stored exactly the same way as in the loop version of 
the program. Data directive DQ is used to set up storage for the 8-byte numbers. First, the 
lower dword (4 bytes) of DATA] is moved into EAX, and the lower dword of DATA2 is 


eeeeeeeeeeeeSFsaseseseseF 


222 


Rewrite Program 3-2 in Chapter 3 to add two 8-byte operands using 32-bit registers. 


TITLE ADD TWO 8-BYTE NUMBER USING 32-BIT REGISTERS IN THE 386 
PAGE comb? 
-MODEL SMALL 
T236 
.STACK 200H 
. DATA 
DATA1 DO 548FB9963CE7H 
ORG 0010H 
DATA2 DQ 3FCD4FA23B8DH 
ORG 0020H 
DATA3 DQ 2 
. CODE 


BEGIN:MOV AX, @DATA 
MOV DS, AX 


(Clue ;clear carry before first addition 
MOV Sl,OFFSET DATA1 7SI is pointer for operandl 
MOV DIL CUMS ae DATAZ ;DI is pointer for operand2 
MOV BX,OFFSET DATA3 ;BX is pointer for the sum 
MOV Cx, OZ 7CX is the loop counter 
BACK: MOV EAX, DWORD PTR [ SI] ;move the operand to EAX 
ADC EAX, DWORD PTR [ DI] ;add the operand to EAX 
MOV DWORD PTR [ BX] , EAX ¿store the sum 


;point to next dword of operandl 


;point to next dword of operand2 


;point to next dword of sum 


;if not finished, continue adding 


REO) losiek rie) BOS 


Program 8-3a 


;this revision of Program 3-la shows how to save the flags 

before updating the pointers 

BACK: MOV EAX,DWORD PTR [ SI] ;move the operand to EAX 
ADC EAX,DWORD PTR [DI] ;add the operand to EAX 
MOV DWORD PTR [ BX] , EAX ;store the sum 
BRUSHE ¿save the flags 


ADD SI,4 ;point to next dword of operandi 
ADD DI, 4 ;point to next dword of operand2 
ADD BX, 4 ;point to next dword of sum 

EOS ;restore the flags 

LOOP BACK ;if not finished, continue adding 


Program 8-3b 


added to EAX. Then the upper dword of DATA1 is moved into EBX, and the upper dword 
of DATA2 is added to EBX, with any carry that may have been generated in the addition 
of the lower dwords. EAX now holds the lower 4 bytes of the result, and EBX holds the 
upper 4 bytes of the result. 

Program 8-4 is much more efficient than using the loop concept. To see why and 
for a discussion of the impact of branching instructions on the performance of programs 
in the 386 and higher CPUs, see Chapter 23. 


eee eee ee eee eee 


CHAPTER 8: 32-BIT PROGRAMMING FOR x86 223 


TITLE 
PAGINO L32 ; (NO-LOOP VERSION) 
-MODEL SMALL 
SSG 
- STACK 200H 
«DATA 
DATA1 DQ 548FB9963CE7H 
ORG OOOH 
DATA2 DQ 3FCD4FA23B8DH 
ORG 0020H 
DATA3 DQ Z 
- CODE 
BEGIN:MOV AX,@DATA 
MOV DS, AX 


MOV DWORD PTR DATA3, EAX ;store lower dword of result 
MOV DWORD PTR DATA3+4,EBX ;store upper dword of result 
MOV AH, 4CH 

INT BAI 


END BEGIN 


Program 8-4 


Combining C with Assembly 


Although Assembly language is the fastest language available for a given CPU, it 
cannot be run on different CPUs. For example, Intel's x86 Assembly programs cannot be 
run on IBM's PowerPC RISC processor based computers since the opcode, mnemonics, 
register names, and size are totally different. Therefore, a portable language is needed. 


Why C? 


Although the dream of a universal language among the peoples of the world is 
still unrealized, C language is becoming the universal language among all the various 
CPUs. Today, a large portion of programs written for all computers are in the C/C++ lan- 
guage. C/C++ is such a universal programming language that it can be run on any CPU 
architecture with little or no modification. It is simply recompiled for that CPU. The fact 
that C/C++ is such a portable language is making it the dominant language of program- 
mers. However, C/C++ is not as fast as Assembly language. Combining C and Assembly 
language takes advantage of C/C++'s portability and Assembly's speed. Today it is very 
common to see a software project written for embedded systems using 90 to 95% C and 
the rest Assembly language. There are two ways to mix C/C++ and Assembly. One is sim- 
ply to insert the Assembly code in C programs, which is commonly referred to as in-line 
assembly. The second method is to make the C/C++ language call an external Assembly 
language procedure. In this section we discuss how to do in-line assembly coding and 
leave the other method to the reader to explore. This section covers Microsoft’s Visual 
C++ and C#. 


Inserting x86 assembly code into Visual C++ programs 


In this section we discuss in-lining with Microsoft C++. For other C compilers, 
consult their C manual. The following code demonstrates how to change the cursor posi- 
tion to row = 10 and column = 20 inaC program. Assembly instructions are prefaced with 
"asm", which is a reserved word. Microsoft uses the keyword " asm". Note that in 
Microsoft, not all interrupts may be supported in the latest versions of Visual C++. The 
following shows two variations of Microsoft format for in-line assembly. 


ADD TWO 8-BYTE NUMBERS USING 32-BIT REGISTERS IN THE 386 


MOV EAX, DWORD PTR DATA1 move lower dword of DATAl into EAX 
ADD EAX, DWORD PTR DATA2 jadd lower dword of DATA2 to EAX 
MOV EBX, DWORD PTR DATA1+4 move upper dword of DATA] into EBX 
ADC EBX, DWORD PTR DATA2+4 ;add upper dword of DATA2 to EBX 


-_—— ee NOO 


224 


/* version 1: using keyword asm before each line of in-line code */ 
/* Microsoft uses keyword "asm" */ 

/* compiled in Visual C++ 2005 Express Edition - a free download 
-- from Microsoft website*/ 


#finclude <iostream> 
#include <windows.h> 
fimellude <tehan.na> 

using namespace std; 


Int omenia (int arge, _TCHAR* argvi ] ) 
{ 
int datal=OxFFFFFF; 
int data2=O0xFFFFFF; 
int sum; 
_asm{ 
mov eax,datal 
mov ebx,data2 
add eax,ebx 
mov sum, eax 
} 
COUE SUM 
return 0; 


As shown above, each line of in-line code is prefaced by the keyword "asm", or 
a block of in-line code is prefaced by "asm". Each line must end in a semicolon or new- 
line, and any comments must be in the correct form for C. 


Review Questions 


1 In the 32-bit, the bits of register EDX can be accessed either as DL, bits _ to __; or 
DH, bits to  ; or DX bits’ “to _.; or EAX, bits __ to 
2. True or false: The instruction "MOV DX [AX]" is illegal ir in 1 the 8086 but "MOV 
DX,[EAX]" is legal in the 32-bit x86. 
What is the default segment register when EAX is used as a pointer? 
What is the purpose of the ".386" directive? 
Compare the number of iterations for adding two 8-byte numbers for the following 
CPUs. 
(a) 8085 (an 8-bit) CPU 
(b) 8086/286 
(c) x86 and higher CPUs (32-bit mode) 
(d) Itanium and 64-bit x86 supercomputer (64-bit system) 


sao 


6. What data directives are used to define 32-bit and 64-bit operands? 
7. What directive is used in MASM to inform the assembler that the program is using 
32-bit instructions? 


PROBLEMS 


SECTION 8.1: 32-BIT PROGRAMMING IN x86 


1. In an x86 program, show the content of each register indicated in parentheses after 
execution of the instruction. 
(a) MOV EAX,9823F4B6H (AL, AH, AX, and EAX) 
(b) MOV EBX,985C2H (BL, BH, BX, and EBX) 


ET ee re eee ee ere en nw o—ES>S=E= 


CHAPTER 8: 32-BIT PROGRAMMING FOR x86 225 


(c) MOV EDX,2000000 (DL, DH, DX, and EDX) 
(d) MOV ESI,120000H (SI, ESD 
2. Show the destination and its contents in each of the following cases. 
(a) MOV EAX,299FF94H 
ADD EAX,34FFFFH 
(b)MOV EBX,500 000 
ADD EBX,700 000 
(c)MOV EDX,40 000 000 
SUB EDX,1 500 000 
(d MOV EAX,39393834H 
AND EAX,0FOFOFOFH 
(e)MOV EBX,9FE35DH 
XOR EBX,0FOFOFOH 
3. Using the little endian convention, show the contents of the destination in each 
case. 
(a) MOV [SI],EAX ;ASSUME SI = 2000H AND EAX = 9823F456H 
(b) MOV [BX],ECX ;ASSUME BX, 348CH AND ECX = 1F23491H 
(c) MOV EBX,[DI] ;ASSUME DI = 4044H WITH THE 
sFOLLOWING DATA. ALL IN HEX. 
DS:4044 = (92) 
DS:4045 = (6D) 
DS:4046 = (A2) 
DS:4047 = (4C) 


ANSWERS TO REVIEW QUESTIONS 


SECTION 8.1: 32-BIT PROGRAMMING IN x86 


1. Oto 7, 8 to 15, 0 to 15, 0 to 31 
2. True 

3. DS 

4. Allows use of 386 instructions 
5. (a)8; (b)4; (c)2; (d)1 

6. DD, DO 
7. 386 


226 


CHAPTER 9 


8088, 80286 MICROPROCESSORS 


AND ISA BUS 


OBJECTIVES 


Upon completion of this chapter, you will be able to: 


>> 
>> 
>> 


>> 
>> 
>> 
>> 


>> 
>> 
>> 
>> 
>> 


State the function of the pins of the 8088 

List the functions of the 8088 data, address, and control buses 

State the differences in the 8088 microprocessor in maximum 

mode versus minimum mode 

Describe the function of the pins of the 8284 clock generator chip 
Describe the function of the pins of the 8288 bus controller chip 
Explain the role of the 8088, 8284A, and 8288 in the PC 

Explain how bus arbitration between the CPU and DMA is 
accomplished 

State the function of the pins of the 80286 

Describe the differences between real and protected modes 
Describe the operation of the 80286 data, address, and control buses 
Describe the purpose of the expansion slots of the IBM PC AT (ISA) bus 
Describe the ISA bus system 


227 


Since the original IBM PCs used 8088 and 80286 microprocessors, this chapter is 
a detailed hardware study of these two microprocessors, as well as the major signals of the 
ISA bus. In Section 9.1, a detailed look at the 8088 CPU, including pin descriptions, is 
provided. Two IC chips that support the 8088, the 8284 clock generator and the 8288 chip, 
are discussed in Section 9.2. Next, the IBM PC address, data, and control buses are cov- 
ered in Section 9.3. In Section 9.4, the 80286 microprocessor is discussed, including pin 
descriptions. Finally, in Section 9.5, the PC’s ISA buses are covered. 


SECTION 9.1: 6088 MICROPROCESSOR 


The first IBM PC used the 8088 microprocessor, and modern PCs still carry that 
legacy. In this section, the function of each pin of the 8088 CPU is described, as well as 
how the microprocessor chip is connected with some simple logic gates to create the 
address, data, and control signals. The 8088 is a 40-pin microprocessor chip that can work 
in two modes: minimum mode and maximum mode. Maximum mode is used when we 
need to connect the 8088 to an 8087 math coprocessor. If we do not need a math 
coprocessor, the 8088 is used in minimum mode. First we look at the 8088 in minimum 
mode since it is much simpler and easy to understand. Maximum mode and supporting 
chips are discussed in Section 9.2. 

In 1978 Intel introduced the 16-bit microprocessor called the 8086. It was 16-bit 
both internally and externally. A year later Intel introduced the 8088 to allow the use of 8- 
bit peripheral chips and to make system boards cheaper. The 8088 is internally identical 
to the 8086, but has only an 8-bit external data bus. Since the original IBM PC introduced 
in 1981 used the 8088, we explore the 8088 instead of the 8086. 


Microprocessor buses 


Every microprocessor-based system must have three sets of separate buses: the 
address bus, the data bus, and the control bus. The address bus provides the path for the 
address to locate the targeted device, while the data bus is used to transfer data between 
the CPU and the targeted device. The control bus provides the signals to indicate the type 
of operation being executed, such as read or write. Next we discuss how these signals are 
provided by the 8088 microprocessor. 


Data bus in 8088 


Figure 9-1 shows the 8088/86 in minimum mode. Pins 9-16 (ADO—AD7) are used 
for both data and addresses in the 8088. At the time of the design of this microprocessor 
in the late 1970s, due to IC chip packaging limitations, there was a great effort to use the 
minimum number of pins for external connections. Therefore, designers multiplexed the 
address and data buses, meaning that Intel used the same pins to carry two sets of infor- 
mation: address and data. Notice that the name of the pins reflects this dual function. In 
the 8088, the address/data bus pins are named AD0-AD7, “AD” standing for 
“address/data.” The ALE (address latch enable) pin signals whether the information on 
pins AD0-AD7 is address or data. Every time the microprocessor sends out an address, it 
activates (sets high) the ALE to indicate that the information on pins AD0-AD7 is the 
address (AQ—A7). This information must be latched, then pins ADO—AD7 are used to carry 
data. When data is to be sent out or in, ALE is low, which indicates that AD0-AD7 will 
be used as data buses (D0-D7). This process of separating address and data from pins 
AD0-AD7 is called demultiplexing. 


Address bus in 8088 


The 8088 has 20 address pins (A0-A 19), allowing it to address a maximum of one 
megabyte of memory (229 = 1M). Pins ADO-AD7 provide the AO-A7 addresses with the 
assistance of a latch. To demultiplex the address signals from the address/data pins, a latch 


ees 


228 


GND 

AD14 AD15 
AD13 A16 
AD12 A17 
ADI1 A18 
AD10 A19 
AD9 “BHE/S7 
AD8 MN/MX 
9 AD7 8 RD 
10AD6 Q HOLD 
11ADS œ HLDA 
12 AD4 WR 
13AD3 Ó Jom 
14 AD2 DT/R 
15 ADI DEN 
16 ADO ALE 
17 NMI INTA 
18 INTR “TEST 
19 CLK READY 
20 GND RESET 


CONN BR WN e=e 
COND UN BW Ne 


Figure 9-1. The 8086 and 8088 in Minimum Mode 
(Reprinted by permission of Intel Corporation, Copyright Intel, 1989) 


12-bit 
Address 
bus 


8-bit 
Address 
bus 


Figure 9-2. Role of ALE in Address/Data Demultiplexing 


Oo ss... eee 


CHAPTER 9: 8088, 80286 MICROPROCESSORS AND ISA BUS 229 


must be used to grab the addresses. 
The most widely used latch is the 
74LS373 IC (see Figures 9-2 and 9-3). 
We can also use the 74LS573 chip 
since it is a variation of the 74LS373 
chip. ADO to AD7 of the 8088 go into 
the 74LS373 latch. ALE provides the 
signal for the latching action. For the 
8088, the output of the 74LS373 pro- 
vides the 8-bit address AO—-A7, while 


A8-Al5 come directly from the Enable oueon 
microprocessor (pins 2—8 and pin 39). 

The last 4 bits of the address come Function Table 
from A16-A19, pin numbers 35-38. 

In any system, all addresses must be Control 


latched to provide a stable, high-drive- 
capability address bus for the system 
(see Figure 9-5). 


8088 control bus Figure 9-3. 74LS373 D Latch 
} (Reprinted by permission of Texas Instruments, Copyright 
There are many control sig- Texas Instruments, 1988) 


nals associated with the 8088 CPU; 

however, for now we discuss those that Table 9-1: Control Signal Generation 
deal with read and write operations. 
The 8088 can access both memory and 
I/O devices for read and write opera- 
tions. This gives us four operations for 
which we need four control signals: 
MEMR (memory read), MEMW 
(memory write), IOR (I/O read), and 
IOW (I/O write). 

The 8088 provides three pins 
for these control signals: RD, WR, and IO/M. The RD and WR pins are both active low. 
IO/M is low for memory and high for I/O devices. From these three pins, four control sig- 
nals are generated: IOR, IOW, MEMR, and MEMW, as shown in Figure 9-4 and listed in 
Table 9-1. Notice that all of these signals must be active low since they go into the RD 
and WR inputs of memory and peripheral chips that are active low. Figure 9-5 shows the 
use of simple logic gates (inverters and ORs) to generate control signals. One can use 
CPLD (complex programmable logic devices) for that purpose and that is exactly what 
chipsets do in today’s PCs. 


Bus timing of the 8088 


In Figure 9-6 the timing for ALE is shown. The 8088 uses 4 clocks for memory 


Figure 9-4. Control Signal Generation 


—— ese 
230 


and I/O bus activities. For example, in the read timing, ALE latches the address in the first 
clock cycle. In the second and third clock cycles, the read signal is provided. Finally, by 
the end of the fourth clock cycle the data must be at the pins of the CPU to be fetched in. 
Notice that the entire read or write cycle time is only 4 clock cycles. If the task of read- 
ing or writing takes more than 4 clocks due to the slowness of memory or I/O devices, 
wait states (WS) can be requested from the CPU. This will be demonstrated in Chapter 10. 


Address 
Bus 


Control 
Signals 


Figure 9-5, Address, Data, and Control Buses in 8088-based System 


Figure 9-6. ALE Timing 


Other 8088 pins 


Pins 24-32 of the 8088 have different functions depending on whether the 8088 
is used in minimum mode or maximum mode. As stated earlier, maximum mode is used 
only when we want to connect the 8088 to an 8087 math coprocessor. In maximum mode, 
the 8088 needs supporting chips to generate the control signals, as described in the next 
section. Table 9-2 lists the functions of pins 24-32 of the 8088 in minimum mode. 


PO 
CHAPTER 9: 8088, 80286 MICROPROCESSORS AND ISA BUS 231 


Other pins of the 8088 are described below. 
MN/MX (minimum/maximum) 


Minimum mode is selected by connecting MN/MX (pin number 33) directly to 
+5 V; maximum mode is selected by grounding this pin. 


NMI (nonmaskable interrupt) 


This is an edge-triggered (going from low to high) input signal to the processor 
that will make the microprocessor jump to the interrupt vector table after it finishes the 
current instruction. This interrupt cannot be masked by software, as we will see in Chapter 
14. 


INTR (interrupt request) 


INTR is an active-high level-triggered input signal that is continuously monitored 
by the microprocessor for an external interrupt. This pin and INTA are connected to the 
8259 interrupt controller chip, as we will see in Chapter 14. 


CLOCK 


Microprocessors require a very accurate clock for synchronization of events and 
driving the CPU. For this reason, Intel has designed the 8284 clock generator to.be used 
with the 8088 processor. CLOCK is an input and is connected to the 8284 clock genera- 
tor. It acts as the heartbeat of the CPU. Any irregularity causes the CPU to malfunction. 
The 8284 chip is used whether the 8088 is connected in minimum mode or in maximum 
mode. The details of the 8284 chip are covered in the next section. 


READY 


READY is an input signal used to insert a wait state for slower memories and I/O. 


Table 9-2: Pins 24—32 in Minimum Mode 


Name and Function 
INTA (interrupt acknowledge) Active-low output signal. Informs interrupt controller 
that an INTR has occurred and that the vector number is available on the lower 8 lines 
of the data bus. 

ALE (address latch enable) Active-high output signal. Indicates that a valid address 

is available on the external address bus. 

DEN (data enable) Active-low output signal. Enables the 74LS245. This 

allows isolation of the CPU from the system bus. 

DT/R (data transmit/receive) Active-low output signal used to control the direction of 
data flow through the 74LS245 transceiver. 

IO/M (input-output or memory) Indicates whether the address bus is accessing memory 
or an T/O device. In the 8088, it is low when accessing memory and high when accessing I/O. 
This pin is used along with RD and WR pins to generate the four control signals 
MEMR, MEMW, IOR, and IOW. 

WR (write) Active-low output signal. Indicates that the data on the data bus is being 
written to memory or an I/O device. Used along with signal IO/M (pin 28) to generate 
the MEMW and IOW control signals for write operations. 

HLDA (hold acknowledge) Active-high output signal. After input on HOLD, the 

CPU responds with HLDA to signal that the DMA controller can use the buses. 

HOLD (hold) Active-high input from the DMA controller that indicates that the device 
is requesting access to memory and I/O space and that the CPU should release 


control of the local buses. 
RD (Read) Active-low output signal. Indicates that the data is being read (brought 


in) from memory or I/O to the CPU. Used along with signal IO/M (pin 28) to generate 
MEMR and IOR control signals for read operations. 


232 


It inserts wait states when it is low. The READY signal is needed to interface the CPU to 
low-speed memories and I/O devices. 


TEST 


In minimum mode this is not used. In maximum mode, however, this is an input 
from the 8087 math coprocessor to coordi- 
nate communications between the two 
processors. 


RESET 


To terminate the present activities of 
the microprocessor, a high is applied to the 
RESET input pin. A presence of high will 
force the microprocessor to stop all activity 
and set the major registers to the values 
shown in Table 9-3. The data in Table 9-3 has 
certain implications in the allocation of mem- 


ory space to RAM and ROM that we will (Reprinted by permission of Intel 
clavify next "i Corporation, Copyright Intel Corp. 1983) 


Table 9-3: IP and Segment Register 
Contents after Reset 


Contents 


At what address does the 8088 wake up? 


According to Table 9-3, when power is applied to the 8088, it wakes up at phys- 
ical address FFFFOH, since a CS:IP address of FFFF:0000 leads to a physical address of 
FFFFOH. Therefore, we must have a nonvolatile memory such as ROM at the FFFFOH 
address. This is discussed further in Chapter 10. 


Review Questions 


1. Describe the differences between the external data bus of the 8086 and 8088. 

In the 8088, pins ADO—AD7 are used for both data and addresses. How does the CPU 
indicate whether the information on these pins is data or an address? 

The 8088 memory or I/O read cycles take clock pulses to complete. 

If we do not need an 8087 math coprocessor, the 8088 is connected in mode. 
Indicate whether each of the following pins are input pins, output pins, or both. 

(a) ADO-AD7 (b) ALE (c) A8-A15 

6. Give the status of the IO/M and RD pins when MEMR is active. 

7. Give the status of the IO/M and WR pins when MEMW is active. 


gee D 


SECTION 9.2: 8284 AND 8288 SUPPORTING CHIPS 


The original IBM PC introduced in 1981 used the 8088 in maximum mode with 
a socket for the 8087 math coprocessor. In maximum mode, the 8088 requires the use of 
the 8288 to generate some of the control signals. In this section we cover the 8088’s sup- 
porting chips, the 8284 and 8288, and their use in maximum mode. Modern microproces- 
sors such as the Pentium have all these chips incorporated into a single chip. Therefore, 
this section can be skipped unless you are interested in the design of the original PC. 

Figure 9-7 shows the 8086/88 in maximum mode. Comparing Figure 9-7 with 
Figure 9-1, we see that pins 24—32 have different functions. To use the 8088 in maximum 
mode we must use the 8288 supporting chip. We describe the 8288 next and how it is used 
with the 8088 in maximum mode. 


8288 bus controller 


As shown in Figure 9-8, the 8288 is a 20-pin chip specially designed to provide 
all the control signals when the 8088 is in maximum mode. The input and output signals 
are described below. 


Re eae creer 
CHAPTER 9: 8088, 80286 MICROPROCESSORS AND ISA BUS 233 


Input signals 


S0, S1, S2 (status input) 
Input to these pins comes from the 8088. Depending upon the input from the 
CPU, the 8288 will provide one of the commands or control signals shown in Table 9-4. 


CLK (clock) 


This is input from the 8284 clock generator, providing the clock pulse to the 8288 
to synchronize all command and control signals with the CPU. The 8284 chip is discussed 
later in this section. 


AEN (address enable) 


AEN, an active-low signal, activates the 8288 command output at least 115 ns 
after its activation. In the IBM PC it is connected to the AEN generation circuitry. 


CEN (command enable) 


An active-high signal is used to activate/enable the command signals and DEN. 
In the IBM PC it is connected to the AEN generating circuitry. 


IOB (input/output bus mode) 


An active-high signal makes the 8288 operate in input/output bus mode rather 
than in system bus mode. Since the IBM PC is designed with system buses, it is connect- 
ed to low. 


Output signals 


The following are the output signals of the 8288 bus controller chip. 
MRDC (memory read command) 


This is active low and provides the MEMR (memory read) control signal. It acti- 


1 GND Vcc 
2 AD14 AD15 
3 AD13 A16/$3 
4 AD12 A17/S4 
5 ADII A18/S5 
6 AD10 A19/S6 
7 AD9 “BHE/S7 
8 AD8 MN/MxX 
9 ADT 8 RD 
10AD6 0 RO/GTO 
11LADS gg RQ/GTI 
12 AD4 LOCK 
13AD3. Óó s2 
14 AD2 

15 ADI 

16 ADO 

17 NMI 

18 INTR 

19 CLK 

20 GND 


Figure 9-7. The 8086 and 8088 in Maximum Mode 
(Reprinted by permission of Intel Corporation, Copyright Intel, 1989) 


——— ees 
234 


vates the selected device or memory to 
release its data to the data bus. 


MWTC (memory write command), AMWC 
(advanced memory write) 


These two active-low signals are SI “82 
used to tell memory to record the data present -— 
on the data bus. These two are the same as the PUSESE 
MEMW (memory write) signal, the only dif- DEN 
ference being that AMWC is activated slight- 
ly earlier in order to give extra time to slow 


devices. INTA 
2 IORC 
IORC (I/O read command) — 
an i l AIOWC 
IORC is an active-low signal that — 
tells the I/O device to release its data to the IOWC 


data bus. In the PC it is called the IOR (I/O 
read) control signal. 


Figure 9-8. 8288 Bus Controller 


(Reprinted by permission of Intel Corporation, 
Copyright Intel, 1983) 


IOWC (V/O write command), AIOWC 
(advanced I/O write command) 


Both are active-low signals used to 
tell the I/O device to pick up the data on the 
data bus. AIOWC is available a little bit early to give sufficient time to slow devices. It is 
unused in the IBM PC. In the PC, IOWC is labeled as IOW. 


Table 9-4: Status Pins of the 8288 and Their Meaning 
S2 8288 Command 


| 0 | Interrupt acknowledge INTA 
Read input/output port IORC 


Write input/output port IOWC, AIOWC 


Write memo MWTC, AMWC 


1 Passive None 


=| =|=.. olololo 


(Reprinted by permission of Intel Corporation, Copyright Intel Corp. 1989) 


INTA (interrupt acknowledge) 


An active-low signal will inform the interrupting device that its interrupt has been 
acknowledged and will provide the vector address to the data bus. In the IBM PC this is 
connected to INTA of the 8259 interrupt controller chip. 


DT/R (data transmit/receive) 


DT/R is used to control the direction of data in and out of the 8088. In the IBM 
PC it is connected to DIR of the 74LS245. When the 8088 is writing data, this signal is 
high and will allow data to go from the A side to the B side of the 74LS245, so that data 
is released to the system bus. Conversely, when the CPU is reading data, this signal is low, 
which allows data to come in from the B to the A side of the 74LS245 data transceiver 
chip so that it can be received by the CPU. 


ese ee eee a sn sess SS 


CHAPTER 9: 8088, 80286 MICROPROCESSORS AND ISA BUS 235 


DEN (data enable) 


An active-high signal will make the data bus either a local data bus or the system 
data bus. In the IBM PC it is used along with a signal from the 8259 interrupt controller 
to activate G of the 74LS245 transceiver. 


MCE/PDEN (master cascade enable/peripheral data enable) 


This is used along with the 8259 interrupt controller in master configuration. In 
the IBM PC the 8259 is used as a slave; therefore, this pin is ignored. 


ALE (address latch enable) 


ALE is an active-high signal used to activate address latches. The 8088 multiplex- 
es address and data through ADO—AD7 in order to save pins. In the IBM PC, ALE is con- 
nected to the G input of the 74LS373, making demultiplexing of the addresses possible. 


8284 clock generator 


The 8284 is used in both minimum and maximum modes since it provides the 
clock and timing for the 8088-based system. Figure 9-9 shows the 8284A, an 18-pin chip 
especially designed for use with the 8088/86 microprocessor. It provides not only the 
clock and synchronization for the microprocessor, but also the READY signal for the 
insertion of wait states into the CPU bus cycle. A description of each pin and how it is con- 
nected in the IBM PC follows. 


Input pins 


RES (reset in) 


This is an input active-low signal to generate RESET. In the IBM PC, it is con- 
nected to the power-good signal from the power supply. When the power switch in the 
IBM PC is turned on, assuming that the power supply is good, a low signal is provided to 
this pin and the 8284 in turn will activate the RESET pin, forcing the 8088 to reset; then 
the microprocessor takes over. This is called a cold boot. 


X1 and X2 (crystal in) 


X1 and X2 are the pins to which a crystal is attached. The crystal frequency must 
be 3 times the desired frequency for the microprocessor. The maximum crystal for the 
8284A is 24 MHz. The IBM PC is connected to a crystal of 14.31818 MHz. 


FIC (frequency/clock) 


This pin provides an option for the way the clock is generated. If connected to 
low, the clock is generated by the 8284 with the help of a crystal oscillator. If it is con- 
nected to high, it expects to receive clocks at the EFI pin. Since the IBM PC uses a crys- 
tal, this pin is connected to low. 


EFI (external frequency in) 


External frequency is connected to this pin if F/C has been connected to high. In 
the IBM PC this is not connected since a crystal is used instead of an external frequency 
generator. 


CSYNC (clock synchronization) 


This active-high signal is used to allow several 8284 chips to be connected togeth- 
er and synchronized. The IBM PC uses only one 8284; therefore, this pin is connected to 
low. 


RDY1 and AEN1 


RDY1 is active high and AEN] (address enable) is active low. They are used 
together to provide a ready signal to the microprocessor, which will insert a WAIT state 
to the CPU read/write cycle. In the PC, RDY1 is connected to DMAWAITT, and AEN1 is 
connected to RDY/WAIT. This allows a wait state to be inserted by either the CPU or 
DMA. 


ee Leese 
236 


RDY2 and AEN2 


___These function exactly like RDY1 
and AEN1 but are designed to allow for a 
multiprocessing system. In the IBM PC, 
RDY2 is connected to low, and AEN2 is con- 
nected to high, which permanently disables 
this function since there is only one 8088 
microprocessor in the system. 


ASYNC 


This is called ready synchronization 
select. An active low is used for devices that 
are not able to adhere to the very strict RDY 
setup time requirement. In the IBM PC this is 
connected to low, making the timing design of 
the system easier with slower logic gates. 


— 


A 
m 
O 
CO 
Oo 
Cj 
O 
O 
oO 


Oo o SI A A A W N 


Output signals 


RESET Figure 9-9. 8284A 
E : ‘ (Reprinted by permission of Intel Corporation, 
This is an active-high signal that pro- Copyright Intel, 1983) 


vides a RESET signal to the 8088. It is acti- 
vated by the RES input signal discussed earlier. 


OSC (oscillator) 


This provides a clock frequency equal to the crystal oscillator and is TTL compat- 
ible. Since the IBM crystal oscillator is 14.31818 MHz, OSC will provide this frequency 
to the expansion slot of the IBM PC. 


CLK (clock) 


This is an output clock frequency equal to one-third of the crystal oscillator, or 
EFI input frequency, with a duty cycle of 33%. This is connected to the clock input of the 
8088 and all other devices that must be synchronized with the CPU. In the IBM PC it is 
connected to pin 19 of the 8088 microprocessor and other circuitry under the CLK88 
label. This frequency, 4.772776 MHz (14.31818 divided by 3), is the processor frequen- 
cy on which all of the timing calculations of the memory and I/O cycle are based. 


PCLK (peripheral clock) 


PCLK is one-half of CLK (or one-sixth of the crystal) with a duty cycle of 50% 
and is TTL compatible. In the IBM PC this 2.386383 MHz is provided to the 8253 timer 
to be used to generate speaker tones, and for other functions. 


READY 


This signal is connected to READY of the CPU. In the IBM PC it is used to sig- 
nal the 8088 that the CPU needs to insert a wait state due to the slowness of the devices 
that the CPU is trying to contact. 


Review Questions 


1. Pin RESET is an (input, output) for the 8284 and an (input, 
output) for the 8088. 

2. True or false. Regardless of whether the 8088 is in minimum or maximum mode, the 
8284 clock generator is needed to provide a reliable clock. 

3. True or false. The 8288 is used to provide control signals for the 8088 when it is in 
minimum mode. 

4. The 8288 output pin controls the direction data flows in and out of the CPU. 


————_—  -  ———— —— lil 


CHAPTER 9: 8088, 80286 MICROPROCESSORS AND ISA BUS 237 


SECTION 9.3: 8-BIT SECTION OF ISA BUS 


Previous sections have explained the 8088 CPU and supporting chips. This sec- 
tion will explain how they are all connected in the original IBM PC to produce the 
required buses to communicate with memory, input/output peripherals, and the 8-bit sec- 
tion of the ISA bus. The study of the 8-bit section of ISA is the main topic of this section. 


A bit of bus history 


The original IBM PC introduced in 1981 used an 8088 microprocessor, whose 8- 
bit data bus gave birth to the 8-bit section of the ISA bus. In 1984 when IBM introduced 
the IBM PC/AT using the 80286 microprocessor, the data bus was expanded to 16 bits. 
The 8-bit data bus can be seen as a subsection of the 16-bit ISA bus. Very often the 8-bit 
data bus was referred to as the IBM PC/XT (extended technology) bus in order to differ- 
entiate it from the IBM PC AT (advanced technology). Eventually the IBM PC AT bus 
became known as the ISA (Industry Standard Architecture) bus since the term “PC AT” 
was copyrighted by IBM. Throughout this book we use the terms PC/XT and PC inter: 
changeably to refer to the 8-bit portion of the IBM PC AT (ISA) bus. The following is the 
description of the three main buses of the IBM PC as generated by the 8088 and support- 
ing chips. 


Local bus vs. system bus 


In the discussion of PC design 
we often see the terms local bus and 
system bus. The system bus not only 
provides necessary signals to all the 
chips (RAM, ROM, and peripheral 
chips) on the motherboard, but also 
goes to the expansion slot for any 
plug-in expansion card. In contrast, 
the local bus is connected directly to 
the CPU. Any communication with 
the CPU must go through the local 
bus. There is a bridge between the 
local bus and the system bus to make 
sure they are isolated from each other. 
Sometimes the system bus is referred 
to as a global bus. We use tri-state 
buffers to isolate the local bus and sys- 
tem bus. For example, 74LS245 is a 
widely used chip for the data bus 
buffer since it is bidirectional. See 
Figure 9-10. Figure 9-11 shows an 


example of local and system buses. Direction Enable 

Figure 9-11 gives an overview of the control 

8088 and its supporting chips as 

designed in the original PC. Notice the Function Table 

role of the 74LS245 and 74LS373 in Enable | Direction Control 

isolating the local and system buses. G Operation 
Everything on the left of the 8288, B data to A bus 
74LS373s, and 74LS245 represent the A data to B bus 
local bus and everything on the right Isolation 


side of those chips are the system bus. 
The 74LS245 and 74LS373s play the 


role of bridge to isolate the local and Figure 9-10. 74LS245 Bidirectional Buffer 


system buses. Now let’s look at each (Reprinted by permission of Texas Instruments, Copyright 
of the buses Texas Instruments, 1988) 


238 


RESET 
(reset drive) 


clk 8088 


3 INTA CEN CLK CONTROL BUS 


Nn 
< 
n 
=- 
leg 
< 
w 
= 
Dn 


si DATA BUS 


EZT Aoa oi 
8-bit data 74LS245 


LOCAL BUS 


Figure 9-11. 8088 Connections and Buses in the PC/XT 


(Reprinted by permission from “IBM Technical Reference” c. 1984 by International Business Machines Corporation) 


Address bus 


Three 74LS373 chips in Figure 9-11 are used for two functions: 

1. To latch the addresses from the 8088 and provide stable addresses to the entire com- 
puter. The address bus is a unidirectional bus. The 74LS373 chips are activated by 
control signals AEN and ALE. When AEN is low, the 8088 provides the address buses 
to the system. The 8288’s ALE (connected to G) enables the 74LS373 to latch the 
addresses from the CPU, providing a 20-line stable address to memory, peripherals, 
and expansion slots. Demultiplexing addresses AO—A7 is performed by the 74LS373 
connected to pins ADO—AD7 of the CPU. The CPU's A8—A 15 is connected to the sec- 
ond 74LS373, and Al6—A19 to the third one. Half of the third 74LS373 is unused. 

2. To isolate the system address buses from local address buses. The system buses must 
be allowed to be used by the DMA or any other board through the expansion slot with- 
out disturbing the CPU. This is achieved by the 74LS373s through AEN. The AEN 
signal is described shortly. 


Data bus 


The bidirectional data bus goes through the 74LS245 transceiver (see 
Figures 9-10 and 9-11). DT/R and DEN are the two signals that activate the 74LS245. DT/R goes 
to DIR of the 74LS245 and makes the transceiver transmit information from the A side to the B 
side when DTA is high. Conversely, when DT/R makes DIR low, the transceiver transfers infor- 
mation from the B side to the A side, thereby receiving information from the system data bus and 
bringing it to the microprocessor. DEN (an active low signal) enables the 74LS245. This isolates 
the data buses to make them either a local bus or a system bus. When the 74LS245 is not active, 
the system data bus is isolated from the local data bus. 


CHAPTER 9: 8088, 80286 MICROPROCESSORS AND ISA BUS 239 


Figure 9-12. ALE, DEN, and DTR Timing for the 8088 System 


Control bus 


The four most important control signals of the IBM PC are IOR (I/O read), IOW 
(I/O write), MEMR (memory read), and MEMW (memory write). They are provided by 
the 8288 chip as shown in Figure 9-11. The timing for the bus activity is shown in Figure 
9-12. 


One bus, two masters 


While the 8088, the main processor, is designed for fetching and executing 
instructions, it is unacceptably slow for transferring large numbers of bytes of data such 
as in hard disk data transfers. Instead, the 8237 chip is used for data transfers of large num- 
bers of bytes. The detailed function of this chip is explained in Chapter 15. All that is 
needed here is to know that the 8237's job is to transfer data and it must have access to all 
three buses to do that. Since no bus can serve two masters at the same time, there must be 
a way to allow either the 8088 processor or the 8237 DMA to gain control over the buses. 
This is called bus arbitration and is achieved by the AEN (address enable) generation cir- 
cuitry. 


AEN signal generation 


When the system is turned on, the 8088 CPU is in control of all the buses. The 
CPU maintains control as long as it is fetching and executing instructions. As can be seen 
from Figure 9-13, AEN is the output signal of the D flip-flop. Since Q is either high or 
low, depending on the status of this signal, either the CPU or the DMA can access the 
buses. Table 9-5 shows the role of AEN in bus arbitration. 


Control of the bus by DMA 


How does AEN become high, handing control of the system buses to DMA? The 
answer is that when DMA receives a request for service, it will notify the CPU that it 
needs to use the system buses by putting a LOW on HRQDMA (this is the same as the 
HOLD signal in minimum mode of the 8088). This in turn will provide a high on the D3 
output of the 74LS175, assuming that the current memory cycle is finished and that 
LOCK is not activated. In the following clock cycle, HLDA (hold acknowledge) is pro- 
vided to the DMA and AEN becomes high, giving control over the buses to the DMA. 


——————eeeseseseSsSFs 


240 


HLDA to 8237 


from 8088 
| i | 

© 

@ 

Aw 
Nn 
N“ 
i” 


74LS175 


clock 


HRQDMA 
from já 


5 
Q 


0 8088 Buses 
from clk88 of 8284A 1 8237 DMA Buses 


Figure 9-13. AEN Generation Circuitry in the PC/XT 


(Reprinted by permission from “IBM Technical Reference” c. 1984 by International Business Machines Corporation) 
Table 9-5: AEN Bus Arbitration 
AEN | Bus Control 


Buses controlled by CPU 
Buses controlled by DMA 


Bus boosting 


One more point that needs explaining 
is bus boosting of the control, data, and address 
buses to provide sufficiently strong signals to 
drive various IC chips. When a pulse leaves an 
IC chip it can lose some of its strength, depending on how far away the receiving IC chip 
is located. In addition, the more pins a signal is connected to, the stronger the signal must 
be to drive them all. Therefore, the signals must be amplified. Stated another way, every 
pin connected to a given signal has input capacitance, and the capacitances are in paral- 
lel; thus as far as that signal is concerned they are all added together, making one big 
capacitor load. This requires that the signal be strong enough to drive all the inputs (see 
Chapter 26 for more details on this topic). It is common to combine the functions of bus 
isolation and bus boosting into a single chip. For example, 74LS373 chips are used to 
boost the addresses provided by the 8088 microprocessor in addition to the bus isolation 
mentioned earlier. The signals provided by the CPU need boosting since the 8088 is a 
CMOS chip. CMOS has a much lower driving capability than TTL, of which 74LS373s 
are made. Likewise, the 74LS245 is used for both data bus booster and data bus isolation. 
Details of IC interfacing and how 74LS245 chips are used for signal amplification (boost- 
ing) are shown in Chapter 26. 


8-bit section of the ISA bus 


As stated earlier, the original IBM PC had an 8-bit data bus. Later with the intro- 
duction of the 80286, the 16-bit version of the bus became available. The 80286 bus 
became known as the ISA bus. The 8-bit bus is a subset of the 16-bit ISA bus and used in 
many peripheral boards. Figure 9-14 shows the 8-bit portion of the ISA bus expansion 
slot. From that figure notice that addresses AO—-A19 and data signals DO—D7 are on the A 
side of the expansion slot. On the A side, also notice the AEN pin. On the B side are found 
control signals IOR, IOW, MEMR, and MEMW. The - sign on these and other control sig- 
nals implies an active-low signal. In Chapter 11 we use signals A0-A19, DO—D7, AEN, 
IOR, and IOW to design an I/O interfacing card. The rest of the signals in 
Figure 9-14 will be covered in subsequent chapters. The signals asscociated with inter- 
rupts (IRQs) are covered in Chapter 14; signals associated with DMA (DREQs and 
DACKs) are covered in Chapter 15. 


a eae acaaaaaaaacaaaacaaaacamcamamamacamacacaaal 
CHAPTER 9: 8088, 80286 MICROPROCESSORS AND ISA BUS 241 


REAR PANEL 
SIGNAL NAME SIGNAL NAME 


-I/O CH CK 
SD7 
SD6 
SD5 
SD4 
SD3 
SD2 
SD1 
SDO 
-I/O CH RDY 
AEN 
SA19 
SA18 
SA17 
SA16 
SA15 
SA14 
SA13 
SA12 
SA11 
SA10 
SA9 
SA8 
SA7 
SA6 
SAS 
SA4 
SA3 
SA2 
SA1 
SAO 


Figure 9-14. ISA Bus Slot Signals Detail (8-bit Section) 


(Reprinted by permission from “IBM Technical Reference” c. 1985 by International Business Machines Corporation) 


Review Questions 


The system bus can be accessed either by the CPU or by 

The control signal that provides bus arbitration is 

True or false. After a cold boot, DMA is given control of the buses. 

The bidirectional data bus goes through the 74LS245 transceiver. Signal 
determines whether data is flowing from the A to the B side or from the B to the A 
side. 

5. Bus is required to provide strong signals to various IC chips in the 
IBM PC. 


SECTION 9.4: 80286 MICROPROCESSOR 


age i 


The 80286 is a 68-pin microprocessor available in either of two packaging for- 
mats: LCC (leaded chip carrier) and PGA (pin grid array). This is in contrast to the 8088, 
which is a 40-pin DIP (dual in-line package). To package the 68-pin IC in DIP packaging 
would have made it a long IC physically and consequently more fragile. Such packaging 
would also necessitate a longer path for some signals and as a result make it unsuitable for 
use in high-frequency systems. Figure 9-15 shows the 80286 in LCC packaging. 

The 80286 can work in one of two modes: real mode or protected mode. In real 
mode, the maximum memory it can access is 1M, 00000H to FFFFFH. To access the 
entire 16M bytes of memory, 000000H to FFFFFFH, it must work in protected mode. In 


——— Leese 


242 


a P 
ERROR Al 
BUSY Te 
N.C. CLK 
Vcc 
RESET 
A3 
A4 
A5 
A6 
m ay 
READY A8 
HOLD A9 
L A10 
COD/INTA All 
Al2 
A13 


PCE a gia e nE Eaa 


Figure 9-15. 80286 Microprocessor (LCC Packaging) 
(Reprinted by permission of Intel Corporation, Copyright Intel Corp. 1983) 


real mode, the 80286 is a faster version of the 8086 with a few new instructions. When 
power is applied to the 80286, it starts up in real mode and can be switched to protected 
mode at any time through a software instruction. However, to use the 286 in protected 
mode requires an extremely complex memory management system. Since very few sys- 
tems are using the 286 in protected mode, it is not discussed here (even in protected mode 
it is still a 16-bit computer, meaning that all registers are 16-bit, as opposed to 32-bit). 


Pin descriptions 


The following are pin descriptions of the 80286 microprocessor. 
Pins A0-A23 (address bus) 


These output signals provide a 24-bit address to be used by the decoding circuit- 
ry to locate memory or I/O. When providing an address for memory, all 24 pins must be 
used (A0-A23); therefore, it can access a maximum of 16M bytes of memory (224 = 
16M). To access an I/O address, only AO—A15 are used. If the I/O address is a 16-bit 
address, AO—A15 are used to provide the address, and pins Al6—A23 are low. If the I/O 
address is an 8-bit address, only AO—A7 are used, and A8—A23 are all low. 


Pins DO-D15 (data bus) 


These pins provide the 16-bit path for data to be transferred in and out of the CPU. 
It must be noted that unlike the 8088, the data bus is not multiplexed. The use of separate 
pins for address and data results in higher pin counts, but saves time since it eliminates the 


CHAPTER 9: 8088, 80286 MICROPROCESSORS AND ISA BUS 243 


need for a demultiplexer. This 2-byte data path to the CPU allows the transfer of data on 
both bytes or on either byte, depending on the operation. The 80286 coordinates the activ- 
ity on the D0-D15 data bus with the help of AO and BHE. 


Pin BHE (bus high enable) 


This is an active-low output signal used to indicate that data is being transferred 
on D8—D15. Table 9-6 shows how BHE and AO are used to indicate whether the data 
transfer is on DO-D7, D8—D15, or the entire bus, DO-D15. 


Table 9-6: BHE, A0, and Byte Selection in the 80286 


Data Bus Status 
Transferring 16-bit data on DO-D15 


lel 
Transferring a byte on the upper half of data bus D8—D15 
ee 

l 


BHE 
0 


Transferring a byte on the lower half of data bus DO—D7 


0 
l 
l Reserved (the data bus is idle) 


(Reprinted by permission of Intel Corporation, Copyright Intel Corp. 1983) 


Pin CLK (clock) 


CLK is an input providing the working frequency for the 80286. The processor 
always works on half of this frequency. For example, if CLK = 16 MHz, the system is an 
8-MHz system. In other words, for the 80286 computer to be an 8-MHz system, the CLK 
must be 16 MHz. 


Pin MAO (memory I/O select) 


MÃO is an output signal used by the CPU to distinguish between I/O and memory 
access. When it is high, memory is being accessed, and when it is low, I/O is being addressed. 


Pin COD/INTA (code/interrupt acknowledge) 


This is an output signal used by the CPU to indicate whether it is performing 
memory read/write of data or an instruction fetch. It is also used to distinguish between 
the action of interrupt acknowledge and I/O cycle. This signal, along with the status sig- 
nals and M/IO, is used to define the bus cycle. 


Pins S1 and SO (status signals) 


These status signals for the bus cycle are both output signals used by the CPU 
along with M/IO and COD/INTA to define the type of bus cycle. 


Pins HOLD and HLDA (hold and hold acknowledge) 


HOLD and HLDA allow the CPU to control the buses. HOLD is an input signal 
to the 80286 and is active high. It is used by devices such as DMA to request permission 
to use the buses. In response, the CPU activates the output signal HLDA by putting a high 
on it to inform the requesting device that it has released the buses for the device's use. The 
DMA has control over the buses as long as HOLD is high, and in response the CPU keeps 
HLDA high. Whenever the DMA brings HOLD low, the CPU responds by making HLDA 
low, and regains control over the buses. 


RESET pin 


This is an input signal and is Table 9-7: Pin State (Bus Idle) During Reset 
active high. When there is a low-to- 
high transition on RESET (and it stays 
high for at least 16 clocks), the 80286 
initializes all registers to their prede- 
fined values and the output pins of the 
80286 will have the status shown in 
Table 9-7. Of the above signals, the 
status of the following pins must be 


ignal Level during 


(Reprinted by permission of Intel Corporation, 
Copyright Intel Corp. 1983) 


aeee 
244 


noted since they are used in the memory design Table 9-8: IP and Segment Registers 
of the IBM PC AT computers: A20 = 1, A21 = 1, After RESET 


AZ2 = lyand A23 =]. 

As long as the RESET pin is high, no i Contents 
instruction or bus activity is allowed. The con- 
tents of the instruction pointer and segment reg- 
isters of the 80286 after RESET are shown in 
Table 9-8. 

It must be noted that when RESET of the 
80286 is activated, it forces the 80286 to enter 
into real mode. In other words, the CPU wakes (Reprinted by permission of Intel 
up in real mode. In real mode the 80286 (indeed, Corporation, Copyright Intel Corp. 1983) 
all the x86s from the 80286 to the Pentium 4) 
processor can address only 1 megabyte since it uses only address lines AO-A19. Since 
RESET also causes A20—A23 to be high, the first instruction for the 286 must be at phys- 
ical address FFFFFOH. This is due to the fact that at reset, CS = F000 and IP = FFFO, mak- 
ing the logical address of the first instruction F000:FFF0. This provides the physical 
address of FFFFOH on A0-A19, and since A20—A23 is high at reset, the physical address 
of the first instruction must be FFFFFOH. This is 16 bytes from the top of the 16M address 
range of the 80286. The 80286 expects to have a far jump at location FFFFFOH and when 
the JMP is executed, the 286 puts 0s on pins A20-A23, making it effectively a 1M range 
real-mode system. Further implications of these facts are discussed in Chapter 10. 


Pin INTR (interrupt request) 


INTR is an input signal into the 80286 requesting suspension of the current pro- 
gram execution. It is used for external hardware interrupt expansion along with the 8259 
interrupt controller chip. See Chapter 14 for more information. 


Pin NMI (nonmaskable interrupt request) 


NMI is an active-high input signal. When this pin is activated, the 80286 will 
automatically perform INT 2, meaning that there is no INTA response since INT 2 is 
assigned to it. See Chapter 14 for more details of this pin. 


READY pin 


READY is an active-low input signal used to insert a wait state and consequent- 
ly prolong the read and write cycle for slow memory and I/O devices. 
Figure 9-16 shows the 80286 timing. Notice the 2-clock cycle time for read. 


Previous cycle . Read cycle Next cycle 


Ts 


Figure 9-16. ALE, DEN, and DTR Timing for the 80286 CPU 


CHAPTER 9: 8088, 80286 MICROPROCESSORS AND ISA BUS 245 


Review Questions 


1. When power is applied to the 80286, which mode does it wake up in, real mode or 
protected mode? 

2. The 286 can access of memory in real mode and in protect- 
ed mode. 

3. When RESET is set to high, what are the contents of the CS and IP registers? 


SECTION 9.5: 16-BIT ISA BUS 


The origin of technical specifications of many of today’s x86 PCs is the 80286- 
based IBM PC/AT. Much of the PC/AT in turn is based on the original 8088-based IBM 
PC introduced in 1981. A major legacy of those original PCs is the ISA (Industry Standard 
Architecture) bus slot. Remember that ISA is another name for the PC/AT bus since 
PC/AT is a trademark copyrighed by IBM Corp. In this section we examine the address, 
data, and control buses of the ISA expansion bus and some of the issues related to them? 
Whether the microprocessor used in a PC is Intel’s Pentium, 386, 486, or an equivalent 
AMD processor, if it has an ISA bus slot, the material in this section is relevant and needs 
to be understood if you want to design expansion cards for ISA slots. ' 

Figure 9-17 shows the 80286 microprocessor, along with supporting chips used in 
the original PC/AT computers. The address, data, and control buses in this figure are used 
throughout the motherboard and are also provided to the ISA expansion slot. In today’s 
PC the 80286 is replaced with Intel’s Pentium or AMD’s Athlon microprocessor, and all 
the control signals are provided by a chipset. A chipset is an IC chip containing all the 
circuitry needed to support the CPU in a given motherboard. For educational purposes 
throughout the book. we use simple logic gates from the original PC to discuss some 
design concepts, even though in the real world the chipset uses CPLDs (Complex 
Programmable Logic Devices) for design with all the circuitry details buried inside. Next, 
we examine the major signals of the ISA expansion slot. 


MEMR 
MEMW Control 
IOR Signals 


IOW 
16-bit 
Data Bus 


BHE 
(bus high enable) 


24-bit 
Address Bus 


A20 control ——| Arbitration 
Circuitry 


Figure 9-17. 80286 Block Diagram and Supporting Chips in the PC AT 


ees 
246 


Exploring ISA bus signals 


In Section 9.3 we discussed the 8-bit section of the ISA bus. The 8-bit section uses 
a 62-pin connector to provide access to the system buses. In order to maintain compatibil- 
ity with the original PC, the 16-bit ISA slot used the 8-bit section as a subset. A 36-pin 
connector was added to incorporate the new signals as shown in Figure 9-18. In design- 
ing a plug-in peripheral card for the ISA slot we need to understand the basic features of 
the ISA signals. The ISA bus has 24 address pins (A0-A23), 16 data pins (D0-D15), plus 
many control signals. 


Address bus 


Addresses A0-A19 are latched using ALE. These addresses are used throughout 
the motherboard and are also provided to the 62-pin part of the ISA slot as SAO-SA19 
(system address). See Figure 9-18. Notice that this is already latched and cannot be 
latched again by a plug-in card. The A20—A23 part of the address is provided in the 36- 
pin section. In the 36-pin section of the ISA slot, Al7—A23 are also provided as 
LA17-LA23 (latchable address). We need to use the ALE signal to latch these addresses 
in the design of plug-in cards. The ALE signal is provided as BALE (buffered ALE) and 
can be used to latch LA17—LA23. 


Data bus 


The data bus is composed of pins DO to D15. The data bus is buffered by a pair 
of 74ALS245 data bus transceivers that are used throughout the motherboard to access 
memory and ports. They are also provided at the expansion slot as SDO-SD15 (system 
data). However, it must be noted that SDO-SD7 are provided at the 62-pin part in order to 
make it compatible with the original 8088-based PC/XT, while SD8-SD15 show up on the 
36-pin part. This allows the 16-bit data bus to access any 16-bit peripheral. To select the 
upper byte or the lower byte of 16-bit data, we use BHE (bus high enable). BHE is latched 
and used on the system board and also provided at the expansion slot under SBHE (sys- 
tem bus high enable). We will see how to use this pin below. 


Memory and I/O control signals 


IOR and IOW are the two control signals used to access ports throughout the sys- 
tem. They show up on the 62-pin section of the ISA expansion slot. This makes them 8088 
PC/XT compatible. We will discuss how these signals are used in peripheral interfacing 
in Chapter 11. 

Signals MEMR, MEMW, SMEMR, and SMEMW are used to access memory. 
There is a reason for duplicate memory read and write signals. To allow access to any 
memory within the range of 16 megabytes, read/write control signals are provided to the 
36-pin section of the ISA expansion slot under the designations of MEMR and MEMW, 
respectively. 

To maintain compatibility with the original 8088-based PC/XT, MEMR and 
MEMW are designated as SMEMR and SMEMW and are provided on the 62-pin part of 
ISA on the same strip as the XT bus systems. In other words, MEMR and MEMW can be 
used to access memory in any location, but to access memory within the 1 megabyte 
range, we must use SMEMR and SMEMW on the 62-pin part of the ISA bus. In this case 
they can be used only to address memory locations 0-FFFFFH. Of course, to allow the 
same signals, MEMR and MEMW, from the support chip to show up in two distinctive 
places with two different names and functions requires some extra logic circuitry. Such 
details are buried inside the chipsets in today’s PC. 


Other control signals 


Examining the ISA bus pins in Figure 9-18, we see numerous control signals that 
we have not seen before. The rest of the control signals in Figure 9-18 are related to the 
interrupt and DMA chips. IRQ and DMA signals are covered in subsequent chapters. 


CHAPTER 9: 8088, 80286 MICROPROCESSORS AND ISA BUS 247 


REAR PANEL 
SIGNAL NAME SIGNAL NAME 


-I/O CH CK 
SD7 
SD6 
SD5 
SD4 
SD3 
SD2 
SD1 
SDO 
-I/O CH RDY 
AEN 
SA19 
SA18 
SA17 
SA16 
SA15 
SA14 
SA13 
SA12 
SA11 
SA10 
SA9 
SA8 
SA7 
SA6 
SA5 
SA4 
SA3 
SA2 
SAI 
SAO 


| 


SBHE 
LA23 
LA22 
LA21 
LA20 
LA19 
LA18 
LA17 
-MEMR 
-MEMW 
SD08 
SD09 
SD10 
SD11 
SD12 
SD13 
SD14 
SD15 


COMPONENT 
SIDE 


Figure 9-18. ISA (IBM PC AT) Bus Slot Signals 


(Reprinted by permission from “IBM Technical Reference” c. 1985 by International Business Machines Corporation) 


a 
248 


ODD and EVEN bytes and BHE 


Table 9-9: Distinguishing Between Odd 
and Even Bytes 


In the 36-pin section of the ISA bus 
there is a pin called SBHE that we explain 
next. Pin C1 is the same as the BHE pin 
from the 80286 that we studied in the last 
section. The BHE pin has to do with the dif- 
ferences between the 8-bit and 16-bit data 
bus CPUs. Like all general-purpose micro- 
processors, the memory (and I/O) space of 
x86 microprocessors is byte addressable. 
That means that every address location can 
provide a maximum of one byte of data. If 
the CPU has an 8-bit data bus, like the 
8088, then the addresses are designated as 0 
to FFFFFH, as shown in Figure 9-19. 
Notice in Figure 9-19 that the bus width for 
the data bus is only 8 bits. In other words, 
only 8 strips of wire connect the CPU’s data 
bus to devices such as memory and I/O 
ports. Since the vast majority of memory 
and I/O devices also have an 8-bit data bus 
of D0-D7, their interfacing to CPUs with 
an 8-bit data bus is simple and straightfor- 
ward. The CPU’s D0-D7 data bus is con- 
nected directly to the DO—D7 data bus of 
memory and I/O devices. This is a perfect 
match. If the CPU has a 16-bit data bus, like 
8086/80286/80386SX microprocessors, Figure 9-19. Memory Byte Addressing 
then the address spaces are designated as in 8088 (8-bit Data Bus) 
odd and even bytes, as shown in 
Figure 9-20. In such cases, the DO—D7 byte is designated as even and the D8—D15 byte 
as odd. To distinguish between odd and even bytes, 8086/286/386SX CPUs provide an 
extra signal called BHE (bus high enable). BHE, in association with the AO pin, is used to 
select the odd or even byte according to Table 9-9. In Figure 9-20, notice the odd and even 
banks. They are called odd and even banks since the memory chips have only an 8-bit data 
bus of D0-D7 and two IC chips must be used, one for each byte. Although Figure 9-20 
shows only 1 megabyte of memory space, the concept of odd and even bytes applies to 
the entire memory and I/O space of the x86 CPU. This is also the case for the 386 and 486 
CPUs with a 32-bit data bus. 


Odd Bank 
(BHE = 0) 


DIS D8 


Figure 9-20. Odd and Even Banks of Memory in 16-bit CPUs (80286) 


CHAPTER 9: 8088, 80286 MICROPROCESSORS AND ISA BUS 249 


A20 gate and the case of high memory area (HMA) 


A20 is an anomaly associated with 286 and higher microprocessors that needs to 
be discussed. In the 8088, when the segment register added to the offset is more than 
FFFFFH, it automatically wraps around and starts at 00000H. However, in 80286 and 
higher processors in real mode, such a wrap-around will not occur. Instead, the result will 
be 100000H, making A20 = 1. The problem is that A20—A23 is supposed to be activated 
only when the CPU is in protected mode. To control activation of A20, IBM used a latch 
controlled by the keyboard in the original PC AT; however, with the introduction of PS 
computers, control of A20 can also be handled by port 92H. One can use this A20 gate (as 
it is commonly called) to create a high memory area (HMA). This concept is important 
for understanding HMA memory in x86 PCs and is discussed in Chapter 25. See 
Examples 9-1 and 9-2 for clarification on this issue. Notice that the process of enabling 
and disabling the A20 gate in Figure 9-17 is handled by a piece of software called the 420 
handler, which is provided with MS-DOS and Windows operating systems. 


Example 9-1 


(a) If the A20 gate is enabled, show the highest address that 286 (and higher processors) 
can access while still in real mode. 
(b) How far high above 1M is this address? j 


Solution: 


(a) To access the highest physical location in real mode, we must have CS = FFFFH and 
IP = FFFFH. We shift left the segment register CS and add the offset IP = FFFF: 


CS shifted ieft one hex digit FFFFOH 
adding the offset IP + FFEFH 
1OFFEFH 


Therefore, the addresses FFFFOH—10FFEFH are the range that the CPU can access 
while it is in real mode. This is a total of 64K bytes. 


If the A20 gate is enabled, accessible memory locations above 1M are 100000 to 
10FFEF. This is a total of 65,520 bytes, or 16 locations short of 1M + 64K. 


Example 9-2 

Assume that CS = FF25H. Find the lowest and highest physical addresses for 
(a) the 8088 (b) the 80286 

Specify the bit on A20. 


Solution: 


(a) The lowest physical address is FF250H and the highest is O0F24FH (FF250H + FFFFH); 
since there are only lines AO—A19 in the 8088, the 1 is dropped. 

(b) In the 286 the lowest address is the same as in the 8088, but the highest physical 
address is 10F24F; therefore, A20 = 1. 


Review Questions 


The first IBM PC AT used the microprocessor. 

What is the advantage of using the 74LS573 chip address latch over the 74LS373? 
What is the purpose of the A20 handler circuitry? 

What address area is called the high memory area in the 80286? 

Since the control signals MEMR and MEMW are available on the 62-pin part of the 
expansion slot, why are they duplicated on the 36-pin part? 


$$$ eee 
250 


oe 


PROBLEMS 


SECTION 9.1: 8088 MICROPROCESSOR 


ie 


A a 


State the main differences between the 8088 and 8086 pinouts. Are the two chips 
interchangeable? 


ALE is an (input, output) signal for the 8088. 

What is the maximum number of bytes of memory addressable by the 8088, and why? 
RESET is an (input, output) signal for the 8088. 

When the 8088 uses the pins for addresses, they are (input, output, 
both in and out), but when they are used for data, they will be (input, out- 


put, both in and out). 

To use a math coprocessor with the 8088, one must connect the 8088 in 
(maximum, minimum) mode. 

True or false. An address must be latched from pins AD0-AD7 in the 8088. 

Which of the following signals is provided by the 8088 CPU in minimum mode? 

(a) INTR (b) ALE (c)WR (d) IO/M 

What is the advantage of demultiplexing address/data in the 8088 CPU? 


. What is the penalty (disadvantage) in terms of clocks in Problem 9? 

. ALE is activated in which T state? 

. Why are 8086-based systems more expensive compared to 8088-based systems? 

. To use the 8088 with the 8087 math coprocessor, is the minimum/maximum pin con- 


nected to low or high? 


. When the input signal RESET in the 8088 is activated, what are the contents of the IP 


and CS registers? 
Use the following for the next three problems. 
MEMR* MEMW* 


(a) 0 0 
(b) 0 ] 
@) 1 0 
(d) 1 | 


* Active low. 


. Which of the above control signals is activated during the memory read cycle? 
. Which one is activated during memory write? 
. Which of the above absolutely cannot happen at the same time? 


SECTION 9.2: 8284 AND 8288 SUPPORTING CHIPS 


18. 


19. 


In maximum mode in an 8088-based system, which chip provides the ALE signal, the 
8088 or the 8288? 

Which of the following signals are provided by the 8288 chip? 

(a) IOR (b)RESET (c)IOW 

(ANMI (e) MEMR ( MEMW 

Use the following for the next three problems. 

MEMR* MEMW* 


(a) 0 0 
(b) 0 ] 
(c) 1 0 
(d) 1 1 


* Active low. 


20. Which of the above control signals is activated during the memory read cycle? 
21. Which one is activated during memory write? 
22. Which of the above absolutely cannot happen at the same time? 


CHAPTER 9: 8088, 80286 MICROPROCESSORS AND ISA BUS 251 


SECTION 9.3: 8-BIT SECTION OF ISA BUS 


23. When the computer is RESET, which master takes over, the 8088 or DMA? 

24. To latch all the address bits of the 8088, how many 74LS373 chips are needed? 

25. In the IBM PC, when AEN = 0 it indicates that the (8088 CPU, DMA) 
is in charge of the buses. Which controls the buses when AEN = 1? 

26. Which chip is used for the following? 
(a) bidirectional bus buffering 
(b) unidirectional bus buffering 

27. In the IBM PC, the 74LS373 is used for which of the following? 
(a) address latch (b) isolating the address bus 
(c) address bus boosting (d) all of the above 

28. Draw a block diagram for the 8088 minimum mode connection to the 74LS373 and 
74LS245. (Modify Figure 9-5.) 


29. To access the buses for interfacing with the CPU, AEN must be (low, high). 
30. In the 74LS245, to allow the transfer of data from side A to side B, DIR = . 
and G = 


31. Answer Problem 30 if data is transferred from side B to side A. 

32. In the 74LS245, what happens if G = 1 and DIR = 0? 

33. The 74LS245 chip is used for (address, data) buses. 

34. To allow the passage of data through the 74LS373, G = and OE = 
. When is the data actually latched? 


SECTION 9.4: 80286 MICROPROCESSOR 


35. True or false. The 80286 is available in both LCC and PGA packages. 

36. The sales clerk at the local computer store says that the 80286 has 24 bits for address 
and 16 bits for data; therefore, it has 224 times 2 bytes = 33,554,432 bytes = 32M 
memory space. Is this person right? Give justification for your answer. 

37. When AO = 0 and BHE = 0, which section of the data bus (the high byte or the low 
byte or both) is transferring information? 

38. When AO = 0, it makes the address an (odd, even) address. 

39. True or false. If CLK is 20 MHz, the 80286-based system is a 10-MHz system. 

40. True or false. The entire 16-megabyte memory space of the 80286 is accessible in real 
mode. 

41. When power is applied to the 80286, it wakes up in _ (real, protected) 
mode. 

42. Indicate the contents of CS, IP, DS, SS, and ES when power is applied to the 286. 

43. In what physical address does the 286 look for the first opcode? 

44. Justify your answer in Problem 43. 


SECTION 9.5: 16-BIT ISA BUS 


45. The ISA expansion slot of the 80286 has two parts. How many pins does each part 
have? State also the number of pins for A, B, C. Which side is the component side? 

46. In the ISA bus, which part of the expansion slot provides signals A20-A23? 

47. In the ISA bus, which part of the expansion slot provides signals D8-D15? 

48. Why is D0-D7 provided on the 62-pin part of the expansion slot? 

49. The BHE signal is provided on the (62-pin, 36-pin) section of the ISA bus. 
Why? 

50. If CS = FC48H and IP = 7652H, find the status of A20 for each member of the x86 
family. 

51. Which of the 8088 and 80286 microprocessors have the BHE pin? 

52. To access memory anywhere in the 16M range, we must use and for 
the memory write and memory read control signals. Which part of the ISA bus pro- 
vides them? 


———— Leese 
252 


53. True or false. In the 8088, there is no A20 pin. 


54. True or false. The 62-pin part of the ISA bus is almost the same as the 62-pin expan- 
sion slot of the original PC/XT. 


ANSWERS TO REVIEW QUESTIONS 
SECTION 9.1: 8088 MICROPROCESSOR 


1. In the 8086, pins ADO-AD15 are used for the data bus; the 8088 has an 8-bit external 
data bus, pins ADO—AD7. 

2. The ALE (address latch enable) pin signals whether the information is data or an 

address. 

4 

Minimum 

(a) both (b) output (c) output 

IOM = 0 and RD = 0 

IO/M = 0 and WR = 0 


“Owes & 


SECTION 9.2: 8284 AND 8288 SUPPORTING CHIPS 


1. Output, input 

2. Tite 

3. False 

4. DTR 

SECTION 9.3: 8-BIT SECTION OF ISA BUS 


8237 DMA 
AEN 

False 
DT/R 
Buffering 


eh a 


SECTION 9.4: 80286 MICROPROCESSOR 


1. Real mode 
2. 1 megabyte, 16 megabytes 
3. CS = F000H and IP = FFFO 


SECTION 9.5: 16-BIT ISA BUS 


1. 80286 
The advantage of the 75LS573 is that all outputs are on one side and all inputs on the 
other, which reduces noise in high-frequency systems and makes the circuit board eas- 
ier to design. 

3. The A20 handler circuitry allows control of the A20 address bit by software, thereby 
solving the problem associated with the A20 pin in the 80286. 

4. 100000H-FFFFFFH 

5. They are provided on the 36-pin part to allow access to extended memory. 


m a a SE LT LL SSS 
CHAPTER 9: 8088, 80286 MICROPROCESSORS AND ISA BUS 253 


254 


CHAPTER 10 


MEMORY AND MEMORY 
INTERFACING 


OBJECTIVES 
Upon completion of this chapter, you will be able to: 


>> Define the terms capacity, organization, and speed as used in 
semiconductor memories 

>> Calculate the chip capacity and organization of semiconductor 
memory chips 

>> Compare and contrast the variations of ROM: PROM, EPROM, 
EEPROM, Flash EPROM, and mask ROM 

>> Compare and contrast the variations of RAM: SRAM, DRAM, and 
NV-DRAM 

>> Diagram methods of address decoding for memory chips 


>> Diagram the memory map of the IBM PC in terms of RAM, VDR, 
and ROM ailocation 

>> Describe the checksum method of ensuring data integrity in ROM 

>> Describe the parity bit method of ensuring data integrity in DRAM 

>> Describe 16-bit memory design and related issues 


235 


This chapter explores memory and memory interfacing of the x86 PC. We first 
study the basics of semiconductor memory chips, then in Section 10.2 we present memo- 
ry address decoding using simple logic gates. The memory map and memory space allo- 
cation of the PC are discussed in Section 10.3. Section 10.4 explores the issue of data 
integrity in RAM and ROM. Section 10.5 discusses the CPU’s bus cycle time for mem- 
ory and shows how to calculate bus bandwidth. 


SECTION 10.1: SEMICONDUCTOR MEMORIES 


In the design of all computers, semiconductor memories are used as primary stor- 
age for code and data. Semiconductor memories are connected directly to the CPU and 
they are the memory that the CPU first asks for information (code and data). For this rea- 
son, semiconductor memories are sometimes referred to as primary memory. The main 
requirement of primary memory is that it must be fast in responding to the CPU; only 
semiconductor memories can do that. Among the most widely used semiconductor mem- 
ories are ROM and RAM. Before we discuss different types of RAM and ROM, we dis- 
cuss terminology common to all semiconductor memories, such as capacity, organization, 
and speed. . 


Memory capacity 


The number of bits that a semiconductor memory chip can store is called its chip 
capacity. It can be in units of K bits (kilobits), M bits (megabits), and so on. This must be 
distinguished from the storage capacity of computers. While the memory capacity of a 
memory IC chip is always given in bits, the memory capacity of a computer is given in 
bytes. For example. an article in a technical journal may state that the 64M chip has 
become popular. In that case, although it is not mentioned that 64M means 64 megabits, 
it is understood since the article is referring to an IC memory chip. However, if an adver- 
tisement states that a computer comes with 64M memory, since it is referring to a com- 
puter it is understood that 64M means 64 megabytes. 


Memory organization 


Memory chips are organized into a number of locations within the IC. Each loca- 
tion can hold 1 bit, 4 bits, 8 bits, or even 16 bits, depending on how it is designed inter- 
nally. The number of bits that each location within the memory chip can hold is always 
equal to the number of data pins on the chip. How many locations exist inside a memo- 
ry chip depends on the number of address pins. The number of locations within a memo- 
ry IC always equals 2* where x is the number of address pins. Therefore, the total number 
of bits that a memory chip can store is equal to the number of locations times the number 
of data bits per location. To summarize: 


1. Each memory chip contains 2* locations, where x is the number of address pins on the 
chip. 

2. Each location contains y bits, where y is the number of data pins on the chip. 

3. The entire chip will contain 2* x y bits, where x is the number of address pins and y 
is the number of data pins on the chip. 

4. The 2*x yis referred to as the organization of the memory chip, where x is the num- 

ber of address pins and y is the number of data pins on the chip. 

For 2* , use Table 10-1 to give the number of locations in K or M units. 

6. 210= 1024 = IK. Notice that in common speech, 1K is 1000 (as in discussing salaries 
or distance), but in computer terminology it is 1024. 


Speed 


Nn 


One of the most important characteristics of a memory chip is the speed at which 
data can be accessed from it. To access the data, the address is presented to the address 
pins, and after a certain amount of time has elapsed, the data shows up at the data pins. 


—_—— LL eesSsh 


256 


The shorter this elapsed time, the better, and consequently, the Table 10-1: Powers of 2 
more expensive the memory chip. The speed of the memory 
chip is commonly referred to as its access time. The access 
time of memory chips varies from a few nanoseconds to hun- 
dreds of nanoseconds, depending on the IC technology used in 
the design and fabrication. 

The three important memory characteristics of capac- 
ity, organization, and access time will be used extensively in 
this chapter and throughout the book. Many of these topics 
will be explored in more detail in the context of applications 
in this and future chapters. Table 10-1 serves as a reference for 
the calculation of memory organization. See Examples 10-1 
and 10-2 for clarification. 


ROM (read-only memory) 


ROM is a type of memory that does not lose its con- 
tents when the power is turned off. For this reason, ROM is 
also called nonvolatile memory. There are different types of 
read-only memory, such as PROM, EPROM, EEPROM, Flash 
ROM, and mask ROM. Each is explained next. 


Example 10-1 


A given memory chip has 12 address pins and 8 data pins. Find: 
(a) the organization (b) the capacity 


Solution: 


(a) This memory chip has 4096 locations (2!2 = 4096), and each location can hold 8 bits 
of data. This gives an organization of 4096 x 8, often represented as 4K x 8. 


(b) The capacity is equal to 32K bits since there is a total of 4K locations and each location 
can hold 8 bits of data. 


Example 10-2 


A 512K memory chip has 8 pins for data. Find: 
(a) the organization (b) the number of address pins for this memory chip 


Solution: 


(a) A memory chip with 8 data pins means that each location within the chip can hold 8 bits 
of data. To find the number of locations within this memory chip, divide the capacity 
by the number of data pins. 512K/8 = 64K; therefore, the organization for this memory 
chip is 64K x 8. 

The chip has 16 address lines since 216 = 64K. 


PROM (programmable ROM) or OTP ROM 


PROM refers to the kind of ROM that the user can burn information into. In other 
words, PROM is a user-programmable memory. For every bit of the PROM, there exists 
a fuse. PROM is programmed by blowing the fuses. If the information burned into PROM 
is wrong, that PROM must be discarded since internal fuses are blown permanently. For 
this reason, PROM is also referred to as OTP (one-time programmable). The process of 
programming ROM is also called burning ROM and requires special equipment called a 
ROM burner or ROM programmer. 


CHAPTER 10: MEMORY AND MEMORY INTERFACING 2357 


EPROM (erasable programmable ROM) 


EPROM was invented to allow changes in the contents of PROM after it is 
burned. In EPROM, one can program the memory chip and erase it thousands of times. 
This is especially useful during development of the prototype of a microprocessor-based 
project. The only problem with EPROM is that erasing its contents can take up to 20 min- 
utes. All EPROM chips have a window that is used to shine ultraviolet (UV) radiation to 
erase the chip's contents. For this reason, EPROM is also referred to as UV-erasable 
EPROM or simply UV-EPROM. Figure 10-1 shows the pins for a 64K-bit UV-EPROM 
chip. Notice the AO—A12 address pins and O0—O7 (output) for D0-D7 data pins. The OE 
(out enable) is for the read signal. 


To program a UV-EPROM chip, the following steps must be taken: 


1. Its contents must be erased. To erase a chip, remove it from its socket on the system 
board and place it in EPROM erasure equipment to expose it to UV radiation for 
15-20 minutes. 

2. Program the chip. To program a UV-EPROM chip, place it in the ROM burner (pro- 
grammer). To burn code and data into EPROM, the ROM burner uses 12.5 volts or 
higher, depending on the EPROM type. This voltage is referred to as VPP in the UV- 
EPROM data sheet. 

3. Place the chip back into its socket on the system board. 


As can be seen from the above steps, in the same way that there is an EPROM 
programmer (burner), there is also separate EPROM erasure equipment. The main prob- 
lem, and indeed the major disadvantage of UV-EPROM, is that it cannot be programmed 
while in the system board (motherboard). To find a solution to this problem, EEPROM 
was invented. 


EEPROM (electrically erasable program- 
mable ROM) 


EEPROM has several advantages over 
EPROM, such as the fact that its method of era- 
sure is electrical and therefore instant, as opposed 
to the 20-minute erasure time required for UV- 
EPROM. In addition, in EEPROM, one can 
select which byte to be erased, in contrast to UV- 
EPROM, in which the entire contents of ROM 
are erased. However, the main advantage of EEP- 
ROM is the fact that one can program and erase 
its contents while it is still in the system board. It 
does not require physical removal of the memory 
chip from its socket. In other words, unlike UV- 
EPROM, EEPROM does not require an external 
erasure and programming device. To utilize EEP- 
ROM fully, the designer must incorporate into 
the system board the circuitry to program the 
EEPROM, using 12.5 V for VPP. EEPROM with 
VPP of 5-7 V is available, but it is more expen- 
sive. In general, the cost per bit for EEPROM is 
much higher than for UV-EPROM. 

Table 10-2 shows examples of some pop- 
ular ROM chips and their characteristics. Notice 
the patterns of the IC numbers. For example, Figure 10-1. UV-EPROM Chip 
27128-20 refers to UV-EPROM that has a capac- (Reprinted by permission of Intel Corporation, 
ity of 128K bits and access time of 200 nanosec- Spynent Inte) Coe 


a 


258 


_— 


2 
3 
4 
5 
6 
7 
8 
9 


Table 10-2: Examples of ROM Memory Chips 


2716-1 4 
2716B 4 
2732A-45 
2732A-20 32K 
271C32 3 
2764A-25 
27C64-15 8 

27128-20 
27C128-25 


2 

25 

27256-20 
2 

25 

5 

mE 


NINININI NiININ İnin 
Co}oofoofoo}/ RIDIS 


27C256-20 
27512-25 
27C512-25 
27C010-12 1M 
27C201-12 M 
27C401-12 


NIWIOWPOPNOPDOsmol]try 
BLM ]N NO] 00] CO] 00] co 


EEPROM 28C16A-25 
2864A 
28C256-15 
28C256-25 

Flash ROM | 28F256-20 
28F256-15 
28F010-20 
28F020-15 [150 


onds. The capacity of the memory chip is indicated in the part number and the access time 
is given with a zero dropped. In part numbers, C refers to CMOS technology. While 27xx 
is for UV-E PROM, 28xx is for EEPROM. See Figure 10-2. 


Flash memory 


Since the early 1990s, Flash ROM has become a popular user-programmable 
memory chip, and for good reasons. First, the process of erasure of the entire contents 
takes only a few seconds, or one might say in a flash, hence its name: Flash memory. In 
addition, the erasure method is electrical and for this reason it is sometimes referred to as 
Flash EEPROM. To avoid confusion, it is commonly called Flash ROM. The major dif- 
ference between EEPROM and Flash memory is the fact that when Flash memory's con- 
tents are erased the entire device is erased, in contrast to EEPROM, where one can erase 
a desired section or byte. Although there are some Flash memories recently made avail- 
able in which the contents are divided into blocks and the erasure can be done block by 
block, unlike EEPROM, no byte erasure option is available. Because Flash ROM can be 
programmed while it is in its socket on the system board, it is widely used to upgrade the 
BIOS ROM of the PC or the operating system on Cisco routers. Some designers believe 
that Flash memory will replace the hard disk as a mass storage medium. This would 
increase the performance of computers tremendously, since Flash memory is semiconduc- 
tor memory with access time in the range of 100 ns compared with disk access time in the 
range of tens of milliseconds. For this to happen, Flash memory's program/erase cycles 
must become infinite, just like hard disks. Program/erase cycle refers to the number of 
times that a chip can be erased and programmed before it becomes unusable. At this time, 
the program/erase cycle is 500,000 for Flash and EEPROM, 2000 for UV-EPROM, and 
infinite for RAM and disks. In Table 10-2, notice that the part number for Flash ROM 
uses the 28Fxx designation, where F indicates the Flash type ROM. 


a SES EEE ooo 
CHAPTER 10: MEMORY AND MEMORY INTERFACING 259 


Example 10-3 
For ROM chip 27128, find the number of data and address pins, using Table 10-2. 
Solution: 


The 27128 has a capacity of 128K bits. Table 10-2 also shows that it has 16K x 8 organization, 
which indicates that there are 8 pins for data, and 14 pins for address (2!4 = 16K). 


1 
2 
3 
4 
5 
6 
7 
8 
9 


— 
— 


Figure 10-2. Pin Configurations for 27xx ROM Family 


Mask ROM 


Mask ROM refers to a kind of ROM whose contents are programmed by the IC 
manufacturer. In other words, it is not a user-programmable ROM. The terminology mask 
is used in IC fabrication. Since the process is costly, mask ROM is used when the needed 
volume is high and it is absolutely certain that the contents will not change. It is common 
practice to use UV-EPROM or Flash for the development phase of a project, and only after 
the code/data have been finalized is mask ROM ordered. The main advantage of mask 
ROM is its cost, since it is significantly cheaper than other kinds of ROM, but if an error 
in the data is found, the entire batch must be thrown away. 


RAM (random access memory) 


RAM memory is called volatile memory since cutting off the power to the IC will 
mean the loss of data. Sometimes RAM is also referred to as RAWM (read and write 
memory), in contrast to ROM, which cannot be written to. There are three types of RAM: 
static RAM (SRAM), dynamic RAM (DRAM), and NV-RAM (nonvolatile RAM). Each 
is explained separately. 


SRAM (static RAM) 


Storage cells in static RAM memory are made of flip-flops and therefore do not 
require refreshing in order to keep their data. This is in contrast to DRAM, discussed 
below. The problem with the use of flip-flops for storage cells is that each cell requires at 
least 6 transistors to build, and the cell holds only 1 bit of data. In recent years, the cells 
have been made of 4 transistors, which is still too many. The use of 4-transistor cells plus 
the use of CMOS technology has given birth to a high-capacity SRAM, but the capacity 


————— ees 


260 


of SRAM is far below DRAM. Table 10-3 shows some examples of SRAM. SRAMs are 
widely used for cache memory, which is discussed in Chapter 22. Figure 10-3 shows the 
pin diagram for the 6116 SRAM chip. The 6116 has an organization of 2K x 8, which 
gives a capacity of 16K bits, as indicated in the part number. The following is a descrip- 
tion of the 6116 SRAM pins. 

A0-A10 are for address inputs, where 11 address lines gives 2!! = 2K. 

1/O0-1/07 are for data I/O, where 8-bit data lines give an organization of 2K x 8. 

WE (write enable) is for writing data into SRAM (active low). 

OE (output enable) is for reading data out of SRAM (active low) 

CS (chip select) is used to select the memory chip. 

The functional diagram for the 6116 SRAM is given in Figure 10-4. 


Figure 10-5 shows the following steps to 
write data into SRAM. 
Provide the addresses to pins AO-A10. 
Activate the CS pin. 
Make WE = 0 while RD = 1. 
Provide the data to pins I/O0-I/O7. 
Make WE = | and data will be written into 
SRAM on the positive edge of the WE sig- 
nal. 


AME Shen) OV 


The following are steps to read data 
from SRAM. See Figure 10-6. 
1. Provide the addresses to pins AO-A10. This 
is the start of the access time (t,,). 
2. Activate the CS pin. 
3. While WE = 1, a high-to-low pulse on the 
OE pin will read the data out of the chip. 


In the 6116 SRAM, the access time, t,,, 
is measured as the time elapsed from the moment 
the address is provided to the address pins to the 
moment that the data is available at the data pins. 
The speed for the 6116 chip can vary from 100 
ns to 15 ns. 


ADDRESS 128 X 128 
DECODER MEMORY ARRAY 


INPUT 
DATA 
CIRCUIT 


Figure 10-4. Functional Block Diagram for 6116 SRAM 


CHAPTER 10: MEMORY AND MEMORY INTERFACING 261 


Examine the read cycle time for SRAM in Figure 10-6. The read cycle time (tge) 
is defined as the minimum amount of time required to read one byte of data, that is, from 
the moment we apply the address of the byte to the moment we can begin the next read 
operation. In SRAM for which t,, = 100 ns, tgc is also 100 ns. This implies that we can 
read the contents of consecutive address locations with each taking no more than 100 ns. 
Hence, in SRAM and ROM, ta, = tec. They are not equal in DRAM, as we will find in 
Chapter 22. 


Addres OX e a 
Toa e e a 


WE — ae 


Data in (Data valid > 


Data Data | 
setup : hold } 


Figure 10-5. Memory Write Timing for SRAM 


: tRC 
<-> 


i Address valid 


Data out 


Figure 10-6. Memory Read Timing for SRAM 


DRAM (dynamic RAM) 


Since the early days of the computer, the need for huge, inexpensive read/write 
memory was a major preoccupation of computer designers. In 1970, Intel Corporation 
introduced the first dynamic RAM (random access memory). Its density (capacity) was 
1024 bits and it used a capacitor to store each bit. The use of a capacitor as a means to 
store data cuts down the number of transistors needed to build the cell; however, it 
requires constant refreshing due to leakage. This is in contrast to SRAM (static RAM), 
whose individual cells are made of flip-flops. Since each bit in SRAM uses a single flip- 
flop and each flip-flop requires 6 transistors, SRAM has much larger memory cells and 
consequently lower density. The use of capacitors as storage cells in DRAM results in 
much smaller net memory cell size. 

The advantages and disadvantages of DRAM memory can be summarized as fol- 
lows. The major advantages are high density (capacity), cheaper cost per bit, and lower 
power consumption per bit. The disadvantage is that it must be refreshed periodically, due 
to the fact that the capacitor cell loses its charge; furthermore, while it is being refreshed, 
the data cannot be accessed. This is in contrast to SRAM's flip-flops, which retain data as 
long as the power is on, which do not need to be refreshed, and whose contents can be 
accessed at any time. Since 1970, the capacity of DRAM has exploded. After the 1K-bit 


262 


(1024) chip came the 4K-bit in 1973, and then the 16K chip in 1976. The 1980s saw the 
introduction of 64K, 256K, and finally 1M and 4M memory chips. The 1990s saw the 
16M, 64M, and 256M DRAM chips. In 1980 when the IBM PC was being designed, 16K- 
bit chips were widely used, but currently motherboards use 64M, 256M, and 1G chips. 
Keep in mind that when talking about IC memory chips, the capacity is always assumed 
to be in bits. Therefore, a 1G chip means a 1-gigabit chip and a 256M chip means a 256M- 
bit chip. However, when talking about the memory of a computer system, it is always 
assumed to be in bytes. For example, if we say that a PC motherboard has 1G, it means 
1G bytes of memory. 


Packaging issue in DRAM 


In DRAM it is difficult to pack a large number of cells into a single chip with the 
normal number of pins assigned to addresses. For example, a 64K-bit chip (64K x 1) must 
have 16 address lines and 1 data line, requiring 16 pins to send in the address if the con- 
ventional method is used. This is in addition to VCC power, ground, and read/write con- 
trol pins. Using the conventional method of data access, the large number of pins defeats 
the purpose of high density and small packaging, so dearly cherished by IC designers. 
Therefore, to reduce the number of pins needed for addresses, multiplexing/demultiplex- 
ing is used. The method used is to split the address into halves and send in each half of 
the address through the same pins, thereby requiring fewer address pins. Internally, the 
DRAM structure is divided into a square of rows and columns. The first half of the address 
is called the row and the second half is called the column. For example, in the case of 
DRAM of 64K x | organization, the first half of the address is sent in through the 8 pins 
A0-A7, and by activating RAS (row address strobe), the internal latches inside DRAM 
grab the first half of the address. After that, the second half of the address is sent in 
through the same pins and by activating CAS (column address strobe), the internal latch- 
es inside DRAM latch this second half of the address. This results in using 8 pins for 
addresses plus RAS and CAS, for a total of 
10 pins, instead of the 16 pins that would be 
required without multiplexing. To access a 
bit of data from DRAM, both row and col- 
umn addresses must be provided. For this 
concept to work, there must be a 2-by-1 mul- 
tiplexer outside the DRAM circuitry while 
the DRAM chip has its own internal demul- 
tiplexer. Due to the complexities associated 
with DRAM interfacing (RAS, CAS, the 
need for external multiplexer and refreshing 
circuitry), many small microprocessor-based 
projects that do not require much RAM use 
SRAM instead of DRAM. Figure 10-7 
shows the pins fora DRAM chip. Notice the 
RAS and CAS pins. Also notice the WE 
(write enable) pin for read and write actions. 
Table 10-3 provides some examples of 
DRAM chips. Figure 10-7. 256K x 1 DRAM 


DRAM, SRAM, and ROM organizations 


Although the organizations for SRAMs and ROMs are always x 8, DRAM can 
have x 1, x 4, x 8, or even x 16 organizations. In some memory chips (notably SRAM), 
the data pins are called I/O. In some DRAMs there are separate pins Din and Dout. The 
DRAMs with x1 organization are widely used for parity bit as we will soon see in this 
chapter. See Examples 10-4 and 10-5. 


cece easel 
CHAPTER 10: MEMORY AND MEMORY INTERFACING 263 


Table 10-3: Examples of RAM Chips 


Pins 


16K. x04 
| 4464-8 [80s] 56K | 4K x4 


16 


DS1225 
DS1230 256K 32K x 8 


* LP indicates low power. 


Example 10-4 


Show possible organizations and number of address pins for the: (a) 256K DRAM chip, and 
(b) 1M DRAM chip. 


Solution: 


(a) For 256K chips, possible organizations are 256K x 1 or 64K x 4. In the case of 256K x 1, 
there are 256K locations and each location inside DRAM provides 1 bit. The 256K 
locations are accessed through the 18-bit address AO-A17 since 2!8 = 256K. The chip 
has only A0-A8 physical pins plus RAS and CAS and one pin for data in addition to 
VCC, ground, and the R/W pin that every DRAM chip must have. For 64K x 4, it 
requires 16 address bits to access each location (216 = 64K), and each location inside 
the DRAM has 4 cells. That means that it must have 4 data pins, DO—D3, 8 address 
pins, AO—A7, plus RAS and CAS. 


In the case of a 1M chip, there can be either 1M x 1 or 256K x 4 organizations. For 
1M x 1, there are AO—A9, 10 pins, to access 229 = 1M locations with the help of RAS 
and CAS and one pin for data. The 256K x 4 has 9 (A0-A8) and 4 (D0-D3) pins, 
respectively, for address and data plus RAS and CAS pins. 


Example 10-5 


Discuss the number of pins set aside for addresses in each of the following memory chips. 
(a) 16K x 4 DRAM (b) 16K x 8 SRAM 


Solution: 

Since 2!4 = 16K: 

(a) For DRAM we have 7 pins (AO—A6) for the address pins and 2 pins for RAS and CAS. 
(b) For SRAM we have 14 pins (A0-A13) for address and no pins for RAS and CAS since 
they are associated only with DRAM. 


264 


NV-RAM (nonvolatile RAM) 


While both DRAM and SRAM are volatile, there is a new type of RAM called 
NV-RAM, nonvolatile RAM. Like other RAMs, it allows the CPU to read and write to it; 
but when the power is turned off, the contents are not lost, just as for ROM. NV-RAM 
combines the best of RAM and ROM: the read and writability of RAM, plus the non- 
volatility of ROM. To retain its contents, every NV-RAM chip internally is made of the 
following components: 


1. It uses extremely power-efficient (extremely low power consumption) SRAM cells 
built out of CMOS. 

2. It uses an internal lithium battery as a backup energy source. 

3. It uses an intelligent control circuitry. The main job of this control circuitry is to mon- 
itor the VCC pin constantly to detect loss of the external power supply. If the power 
to the VCC pin falls below out-of-tolerance conditions, the control circuitry switches 
automatically to its internal power source, the lithium battery. In this way, the inter- 
nal lithium power source is used to retain the NV-RAM contents only when the exter- 
nal power source is off. 


It must be emphasized that all three of the components above are incorporated 
into a single IC chip, and for this reason nonvolatile RAM is much more expensive than 
SRAM as far as cost per bit is concerned. Offsetting the cost, however, is the fact that it 
can retain its contents up to ten years after the power has been turned off and allows one 
to read and write exactly the same as in SRAM. See Table 10-3 for NV-RAM parts made 
by Dallas Semiconductor. In the x86 PC, NV-RAM is used to save the system setup. This 
NV-RAM in PC is commonly referred to as CMOS RAM. 


Review Questions 


1. The speed of semiconductor memory is in the range of : 
Find the organization and chip capacity for each of the following with the indicated 
number of address and data pins. 
(a) 11 address, 8 data SRAM (b) 13 address, 8 data ROM 
(c) 8 address, 4 data DRAM (d) 9 address, 1 data DRAM 
3. Find the capacity and number of pins set aside for address and data for memory chips 
with the following organizations. 
(a) 16K x 8 SRAM (b) 32K x 8 EPROM (c) 1M x 1 DRAM 
(d) 256K x 4 DRAM (e) 64K x 8 EEPROM (f) IM x 4 DRAM 
4. Why is Flash memory preferable to UV-EPROM in system development? 
5. What kind of memory is used in the CMOS RAM of the x86 PC? 


SECTION 10.2: MEMORY ADDRESS DECODING 


Current system designs use CPLDs (complex programmable logic devices), in 
which memory and address decoding circuitry are integrated into one programmable chip. 
However, it is still important to understand how this task can be performed with common 
logic gates. In this section we show how to use simple logic gates to accomplish address 
decoding. The CPU provides the address of the data desired, but it is the job of the decod- 
ing circuitry to locate the memory chip where the desired data is stored. To explore the 
concept of decoding circuitry, we look at the use of NAND and 74LS138 chips as 
decoders. In this discussion we use SRAM or ROM for the sake of simplicity. 


Simple logic gate as address decoder 


As seen in the last section, memory chips have one or more CS (chip select) pins 
that must be activated for the memory's contents to be accessed. Sometimes the chip select 
is also referred to as chip enable (CE). In connecting a memory chip to the CPU, the data 


CHAPTER 10: MEMORY AND MEMORY INTERFACING 265 


Figure 10-8. Using Simple Logic Gate as Decoder 


A19 AO ` 
0000 f 1000 | 0000 | 0000 | 0000 | =08000H address of the first location 
0000 1111 1111 l = OFFFFH address of the last location 


Figure 10-9. Adáress Range Assigned to Memory Chip in Figure 10-8 


A19 AO 
1001 0000 0000 0000 0000 = 90000H address of the first location 
1001 mi] 1111 1111 1111 = 9FFFFH address of the last location 


Figure 10-10. Decoder and Its Associated Address Range 


266 


bus is connected directly to the data pins of the memory. Control signals MEMR and 
MEMW are connected to the OE and WR pins of the memory chip, respectively (see 
Figure 10-8). In the case of the address buses, while the lower bits of the address go 
directly to the memory chip address pins, the upper ones are used to activate the CS pin 
of the memory chip. It is the CS pin along with RD/WR that allows the flow of data in or 
out of the memory chip. In other words, no data can be written into or read from the mem- 
ory chip unless CS is activated. The CS input is active low and can be activated using 
some simple logic gates, such as NAND and inverters. See Figures 10-8, 10-9, and 10-10. 
Example 10-6 shows the address range calculation for Figure 10-10. 


Example 10-6 


Referring to Figure 10-10 we see that the memory chip has 64K bytes of space. Show the cal- 
culation that verifies that address range 90000 to 9FFFFH is comprised of 64K bytes. 
Solution: 

To calculate the total number of bytes for a given memory address range, subtract the two 
addresses and add | to get the total bytes in hex. Then the hex number is converted to decimal 
and divided by 1024 to get K bytes. 


OFFFF PRERE 
-90000 aE 1 


OFFFF 10000 hex = 65,536 decimal = 64K 


In Figure 10-10, notice that the output 
of the NAND gate is active low and that the CS Block Diagram 
pin is also active low. That makes them a per- 
fect match. Also notice that Al19-A16 must 
equal 1001 in order for CS to be activated. This 
results in the assignment of addresses 9000H to 
9FFFFH to this memory block. Figures 10-8 
and 10-10 show that for every block of memo- 
ry, we need a NAND gate. The 74LS138 has 8 
NAND gates in it; therefore, a single chip can 
control 8 blocks of memory. This was the ——_— 
method of memory addressing decoding used G2A G2B G] 
before the introduction of CPLD, and it is still 
the best method if you do not have access to 


LD. 
Cie” Enable 


Using the 74LS138 as decoder 
In the absence of CPLD or FPGA as 


Function Table 


address decoders, the 74LS138 chip is an excel- Ou 
lent choice. The 3 inputs A, B, and C of the |C BA] YOYIY2Y3Y4Y5Y6Y7 


>< 


e e a er E 


HHHHHHHH 
HHHHHHHH 
LHHHHHHA 
HLHHHH HH 
FLAG El a HH 
HHHLHHHH 
HORNE ENAH HAF 
HHHHHLHH 
HHHHHHLH 


74LS138 generate 8 active-low outputs YO—-Y7, 
as shown in Figure 10-11. Each Y output is con- 
nected to the CS of a memory chip, allowing 
control of 8 memory blocks by a single 
74LS138. This eliminates the need for using 
NAND and inverter gates. As shown in Figure 
10-11, where A, B, and C select which output is 
activated, there are three additional inputs, 
G2A, G2B, and Gl, that can be used for 
address or control signal selection. Notice that HHHHHHHL 
G2A and G2B are both active low, while G1 is Figure 10-11. 74LS138 Decoder 


active high. If any one of the inputs G1, G2A, (Reprinted by permission of Texas Instruments, 
Copyright Texas Instruments, 1988) 


esere eee 


CHAPTER 10: MEMORY AND MEMORY INTERFACING 267 


Address range C0000-CFFFF is assigned to Y4. 


Each Y controls 
one block. 


Figure 10-12. 74LS138 as Decoder 


or G2B is not connected, they must be activated permanently by either VCC or ground, 
depending on the activation level. 


In Figure 10-12, we have AO-A15 going from the CPU directly to AO—-A15 of the 
memory chip. Al6—A18 are used for the A, B, and C inputs of the 74LS138. A19 is con- 
trolling the G1 pin of the 74138. For the 74138 to be enabled, we need G2A = 0, G2B = 
0, and Gl = 1. G2A and G2B are grounded. When G1 = 1, this 74138 is selected. 
Depending on the status of pins A, B, and C, one of the Ys is selected. To select Y4, we 
need CBA = 100 (in binary). That gives us the address range of C0000 to CFFFFH for 
the memory chip controlled by the Y4 output. For further clarification, see 
Example 10-7 and Figure 10-13. 


Example 10-7 


Looking at the design in Figure 10-13, find the address range for (a) Y4, (b) Y2, and (c) Y7, and 
verify the block size controlled by each Y. 


Solution: 


(a) The address range for Y4 is calculated as follows. 

A19 A18 A17 Al6 A15 Al4 A13 Al12 All A10 A9 A8 AJ AG AS A4 A3 AQ Al AO 
nn a oo Mee eG 
Oe de a De rs 


The above shows that the range for Y4 is F0000H to F3FFFH. In Figure 10-13, notice that A19, 
A18, and A17 must be 1 for the decoder to be activated. Y4 will be selected when A16 A15 A14 


= 100 (4 in binary). The remaining A13—A0 will be 0 for the lowest address and 1 for the high- 
est address. 


(b) The address range for Y2 is E8000H to EBFFFH. 

Al9 Al8 AL7 Al6 A15 Al4 A13 A12 All A10 AJ A8 A7 AGAS AA AZ AD AI AD 
0 1 0 0 0 0 0 “GO 0 0 O mO sO ie 
0 D2 0 1°) Ge See 1 2 eee) ee 


(c) The address range for Y7 is FC000H to FFFFFH. Notice that FFFFF — FC000H = SPP EE, 


which is equal to 16,383 in decimal. Adding | to it because of the 0 location, we have 16,384. 
16,384/1024 = 16K, the block (chip) size. 


268 


Each Y controls 
one block. 


Figure 10-13. 74LS138 as Decoder (See Example 10-7) 


Review Questions 


1. The MEMR signal from the CPU is connected to the pin of the ROM chip. 

2. The MEMW signal from the CPU is connected to the pin of the RAM chip. 

3. The CS pin of the memory chip is normally an (active-low, active-high) 
signal. 

4. The 74LS138 has total of outputs. 

5. The Y output of the 74138 is (active low, active high). 


SECTION 10.3: IBM PC MEMORY MAP 


All x86 CPUs in real mode provide 20 address bits (AQ-A19). Therefore, the 
maximum amount of memory that they can access is one megabyte. How this 1M is allo- 
cated in the original PC is the main topic of this section. The 20 lines, A0—A 19, of the sys- 
tem address bus can take the lowest value of all Os to the highest value of all 1s in bina- 
ry. Converting these values to hexadecimal gives an address range of 00000H to FFFFFH. 
This is shown in Figure 10-14. Any address that is assigned to any memory block in the 
8088-based original PC must fall between these two ranges. This includes all x86 micro- 
processors in real mode. 

The 20-bit address of the 8088 provides a maximum of 1M (1024K bytes) of 
memory space. Of the 1024K bytes, the designers of the original IBM PC decided to set 
aside 640K for RAM, 128K for video display RAM (VDR), and 256K for ROM, as shown 
in Figure 10-15. In today’s PC, 640K bytes is not that much, but the standard of the per- 
sonal computer in 1980 was 64K bytes of memory. At that time, 640K seemed like more 
than anyone would ever need. Next we discuss the memory map of the PC. 


Al 


9 AO 
0000 | 0000 | 0000 | 0000 | 0000 | = 90000H minimum 20-bit address 
Ane nadie 1111 ri 1111 = FFFFFH maximum 20-bit address 


Figure 10-14. 20-bit Address Range in Real Mode for x86 CPUs 


CHAPTER 10: MEMORY AND MEMORY INTERFACING 269 


Conventional memory: 640K of RAM 


In the x86 PC, the addresses from 00000 to ROM 256K 
OFFFFH, including location 9FFFFH, are set aside for 
RAM. In early PCs, only 64K to 256K bytes of RAM BFFFF 
were on the motherboard and the rest had to be expand- A0000 
ed by adding a memory expansion plug-in card. In 
those early models, when a RAM memory board was 
installed, switches had to be set to inform BIOS and RAM 640K 
DOS of the added memory. In today’s x86 PC this is 
done by the CMOS set-up process and this information 
is kept by the CMOS NV-RAM for the next cold boot. 
Of the 640K bytes of memory, some were used by the 
operating system (the amount depends on the version 
of DOS) and the rest of the available RAM is used by 
utilities and application programs. This 640K bytes of 
memory is commonly referred to as conventional of the IBM PC 
memory. Notice that even though the vast majority of 
PCs use MS Windows for the operating system the above concepts are still valid since 
DOS is embedded into Windows to run legacy applications. ° 

Of the total amount of RAM installed, the first 1K (00000 to 003FF = 1024 bytes) 
is set aside for the interrupt vector table (see Chapter 15). 00400 to 004FF is set aside for 
the BIOS temporary data area. Finally, a certain number of kilobytes is occupied by the 
DOS operating system itself. 


Figure 10-15. Memory Map 


Example 10-8 
Show the calculation that verifies that addresses 00000 to 9FFFFH comprise 640K bytes. 


Solution: 


To calculate the total number of bytes for a given memory address range, subtract the two 


addresses and add | to get the total bytes in hex. Then the hex number is converted to decimal 
and divided by 1024 to get K bytes. 


OREPF 9FFFF 
—_00000 + 00001 
OFF RP A0000 hex = 655,360 decimal = 640K 


BIOS data area 


As mentioned earlier, the BIOS data area is a section of RAM memory used by 
BIOS to store some extremely important system information. A partial list of that infor- 
mation is given in Figure 10-16. BIOS stores system information in the BIOS data area as 
it tests each section of the PC. The operating system navigates the system hardware with 
the help of information stored in the BIOS data area. For example, the BIOS data area tells 
the operating system how many serial and parallel ports are installed in the PC. We will 
examine this topic further in the serial and parallel port chapters. 


Video display RAM (VDR) map 


To display information on the monitor of the PC, the CPU must first store that 
information in memory called video display RAM (VDR). It is the job of the video con- 
troller to display the contents of VDR on the screen. Therefore, the address of the VDR 
must be within the CPU address range. In the x86 PC, from A0000 to BFFFFH, a total of 


ee ese 
270 


Partial Listing of IBM PC RAM Memory Map for Interrupt, BIOS Data 


Memory Location Bytes Description 
0000:0000 to 0000:03FF 1024 interrupt table 


0000:0400 to 0000:0401 port address of com] 
0000:0402 to 0000:0403 port address of com2 
0000:0404 to 0000:0405 port address of com3 
0000:0406 to 0000:0407 port address of com4 


0000:0408 to 0000:0409 port address of lpt1 
0000:040A to 0000:040B port address of lpt2 
0000:040C to 0000:040D port address of Ipt3 
0000:040E to 0000:040F port address of Ipt4 


0000:0410 to 0000:0411 list of installed hardware 
0000:0412 to 0000:0412 initialization flag 


0000:0413 to 0000:0414 memory size (K bytes) 
0000:0415 to 0000:0416 memory in I/O channel (if any) 


0000:0417 to 0000:0418 2 keyboard status flag 
0000:0419 to 0000:0419 alternate key entry storage 


Figure 10-16. The BIOS Data Area in PC 


Table 10-4: Video Display RAM Memory Map 


Number of 
Adapters Bytes Used Starting Address 


CGA, EGA, VGA 16,384 (16 K B8000H 


MDA, EGA, VGA 4096 (4K B0000H 
EGA, VGA 65,536 (64K) A0000H 


128K bytes of the CPU's addressable memory is allocated for video. Of that 128K, only a 
portion is used for VDR, the amount used depending on the mode in which the video sys- 
tem is being used (text or graphics), and the resolution. For example, the monochrome 
video mode uses only addresses starting at B0000 up to 4K bytes of RAM, color graphics 
mode uses addresses starting at B8000, and VGA has a starting address of A0000. See 
Table 10-4. For more details of each video mode and how many bytes of memory are used 
in text and graphics modes and their resolution, see Chapter 16. 


ROM address and cold boot in the PC 
Table 10-5: 8088 


When power is applied to a CPU it must wake up at After RESET 
an address that belongs to ROM. Obviously, the first code exe- 
cuted by the CPU must be stored in nonvolatile memory. The 
IBM PC is no exception to this design rule. After RESET the 
8088 has the values shown in Table 10-5. This means that upon 
RESET, the 8088 starts to fetch information from CS:IP of 
FFFF:0000, which gives the physical address FFFFOH. This is 
the reason that BIOS ROM is located at the upper address 
range of the memory map. As a result, when the PC is RESET, 
ROM BIOS is the memory block that is accessed first by the 
CPU. The ROM BIOS has, among other things, programs that aF 
do the testing of the CPU, ROM, and RAM. After those tests, T A passion 
it initializes all peripheral devices, sets up the system, and Copyright Intel 1989) 


O M Sl S o EEO a 


CHAPTER 10: MEMORY AND MEMORY INTERFACING 271 


Contents 


loads the operating system from hard disk into DRAM and hands over the control of the 
PC to the operating system. Since the microprocessor starts to fetch and execute instruc- 
tions from physical location FFFFOH there must be an opcode sitting in that ROM loca- 
tion. In the x86 PC, the CPU finds the opcode for the FAR jump, EA, at location FFFFOH 
and the target address of the JUMP. You can verify that on your PC regardless of the 
microprocessor installed on the motherboard. Example 10-9 shows one such case using a 
simple DEBUG command. Notice in Example 10-9 that the date of ROM BIOS of a PC 
is stored in locations F000:FFF5 to F000:FFFD of BIOS ROM. 


Example 10-9 

Using the DEBUG dump command, verify the JMP address for the cold boot and the BIOS date. 
Solution: 

From the directory containing DEBUG, enter the following: 


C>DEBUG 
=d S6££:0 LF 
FFFF:0000 EA 5B EO 00 FO 30 34 2F-30 33 2F 30 37 00 FC .[....04/03/07.. 


The first 5 bytes showed the jump command “EA” and the destination “F0000:E05B”. The next 
8 bytes show the BIOS date, 04/03/07. 


Example 10-10 


Suppose that you buy a software package and encounter a problem installing and running it on 
your computer. After contacting the technical support department of the manufacturer, you are 
told that the package is good for the BIOS ROM date of 10/08/07, but you are not told how to 
find the date. Use DEBUG to find the date for the ROM BIOS of an x86 PC. 


Solution: 


In the DOS directory (or wherever you keep DEBUG), type the following at the DOS prompt: 


C> DEBUG 

=) FOOOREFFES EBED 

F000: FFFO 30 34 2F-30 33 2F 30 37 00 04/03/07. 
$ 

C> 


The BIOS data is stored at F000:FFF5 through F000:FFFD. In the above case, the BIOS data is 
04/03/07, which is earlier than the 10/08/07 date that you hoped to find. 


Review Questions 


— 


What address range is called conventional memory? How many K bytes is that? 

2. If the starting physical address of VDR is BOOOOH, what is the last address if it uses 
16K bytes of RAM? 
(a) Show the beginning and ending physical addresses. 
(b) Give the corresponding logical addresses. 

3. If the total ROM memory space used by BIOS and other expansion boards is 92K 
bytes, how many bytes are still unused? 

4. What are the contents of CS and IP after the 8088 is reset (cold boot)? 

5. What is the implication of Question 4? 


—— LL eeesesSMh 
272 


SECTION 10.4: DATA INTEGRITY IN RAM AND ROM 


When storing data, one major concern is maintaining data integrity. That is, ensur- 
ing that the data retrieved is the same as the data stored. The same principle applies when 
transferring data from one place to another. There are many ways to ensure data integrity 
depending on the type of storage. The checksum method is used for ROM, and the parity 
bit method is used for DRAM. For mass storage devices such as hard disks and for trans- 
ferring data on the Internet, the CRC (cyclic redundancy check) method is employed. In 
this section we discuss the checksum and parity methods. 


Checksum byte 


To ensure the integrity of the contents of ROM, every PC must perform a check- 
sum calculation. The process of checksum will detect any corruption of the contents of 
ROM. One of the causes of ROM corruption is current surge, either when the PC is turned 
on or during operation. The checksum method uses a checksum byte. This checksum byte 
is an extra byte that is tagged to the end of a series of bytes of data. To calculate the check- 
sum byte of a series of bytes of data, the following steps can be taken. 


1. Add the bytes together and drop the carries. 
2. Take the 2's complement of the total sum, and that is the checksum byte, which 
becomes the last byte of the stored information. 
To perform the checksum operation, add all the bytes, including the checksum 
byte. The result must be zero. If it is not zero, one or more bytes of data have been changed 
(corrupted). To clarify these important concepts, see Examples 10-11 and 10-12. 


Checksum program 


When the PC is turned on, one of the first things the BIOS does is to test the sys- 
tem ROM. The code for such a test is stored in the BIOS ROM. Figure 10-17 shows the 
program using the checksum method. Notice in the code how all the bytes are added 
together without keeping the track of carries. Then, the total sum is ORed with itself to 
see if it is zero. The zero flag is expected to be set to high upon return from this subrou- 
tine. If it is not, the ROM is corrupted. 


ROS CHECKSUM PROC NEAR  ;NEXT_ROS MODULE 
B90020 5 MOV CX,8192 ;NUMBER OF BYTES TO ADD 

ROS CHECKSUM CNT: ;ENTRY PT. FOR OPTIONAL ROS TEST 
32C0 T ZOR A AL 

C26: 


0207 ADD AL, DS:[ BX] 
43 INC BX 7; POINT TO INEXT BYM 
E2FB 2 IOKONE! (AG 7ADD ALL BYTES IN ROS MODULE 
OACO OR AL,AL ; SUM = 0? 
CS RET 
ROS CHECKSUM ENDP 


Figure 10-17. PC BIOS Checksum Routine 


(Reprinted by permission from “IBM Technica] Reference” c. 1984 by International Business Machines Corporation) 


D 
CHAPTER 10: MEMORY AND MEMORY INTERFACING 273 


Example 10-11 


Assume that we have 4 bytes of hexadecimal data: 25H, 62H, 3FH, and 52H. 
(a) Find the checksum byte. 

(b) Perform the checksum operation to ensure data integrity. 

(c) If the second byte 62H had been changed to 22H, show how checksum detects the error. 


Solution: 


(a) 


The checksum is calculated by first adding the bytes. 


25l 
ar (Sel 
ap IG 
+ 52H 
Lo Ilek 


The sum is 118H, and dropping the carry, we get 18H. The checksum byte is the 2's 
complement of 18H, which is E8H. 


Adding the series of bytes including the checksum byte must result in zero. This 
indicates that all the bytes are unchanged and no byte is corrupted. 


25H 
+ GAl 
Beets 
+ 9528 
+ E8H 
2 00H (dropping the carry) 


Adding the series of bytes including the checksum byte shows that the result is not zero, 
which indicates that one or more bytes have been corrupted. 


Zoe 
+ 228 
+ 3185 
mabe ee 
+ EGH 
1 COH dropping the carry, we get COH. 


Example 10-12 


Assuming that the last byte of the following data is the checksum byte, show whether the data 
has been corrupted or not: 28H, C4H, BFH, 9EH, 87H, 65H, 83H, 50H, A7H, and 51H. 


Solution: 
The sum of the bytes plus the checksum byte must be zero; otherwise, the data is corrupted 
28H + C4H + BFH + 9EH + 87H + 65H + 83H + 50H + A7H + 51H = 500H 

By dropping the accumulated carries (the 5), we get 00. The data is not corrupted. See Figure 
10-17 for a program that performs this verification. 


Use of parity bit in DRAM error detection 


System boards or memory modules are populated with DRAM chips of various 
organizations, depending on the time they were designed and the availability of a given 
chip at a reasonable cost. The memory technology is changing so fast that DRAM chips 


ees 
274 


on the boards have a different look every year or two. While early PCs used 64K DRAMs, 
current PCs commonly use 1G chips. To understand the use of a parity bit in detecting data 
storage errors, we use some simple examples from the early PCs to clarify some very 
important design concepts. It must be noted that in today’s PCs, these design concepts are 
still the same, even though the DRAMs have much higher density and FPGAs in the 
chipset are used in place of TTL logic gates. You may wish to review DRAM organiza- 
tion and capacity, covered earlier in this chapter, before proceeding. 


DRAM memory banks 


The arrangement of DRAM chips on the system or memory module boards is 
often referred to as a memory bank. For example, the 64K bytes of DRAM can be 
arranged as one bank of 8 IC chips of 64K x 1 organization, or 4 banks of 16K x 1 organ- 
ization. The first IBM PC introduced in 1981 used memory chips of 16K x 1 organization. 
Figure 10-18 shows the memory banks for 640K bytes of RAM using 256K and 1M 
DRAM chips. Notice the use of an extra bit for every byte of data to store the parity bit. 
With the extra parity bit, every bank requires an extra chip of x 1 organization for parity 
check. Figure 10-19 shows DRAM design and parity bit circuitry for a bank of DRAM. 
First, note the use of the 74LS158 to multiplex the 16 address lines A0-A15, changing 
them to the 8 address lines of MAO-MA7 (multiplexed address) as required by the 64K x 
1 DRAM chip. The resistors are for the serial bus line termination to prevent undershoot- 
ing and overshooting at the inputs of DRAM. They range from 20 to 50 ohms, depending 
on the speed of the CPU and the printed circuit board layout. 

A few additional observations about Figure 10-19 should be made. The output of 
multiplexer addresses MAO-MA7 will go to all the banks. Likewise, memory data 
MD0-MD7 and memory data parity MDP will go to all the banks. The 74LS245 not only 
buffers the data bus MDO-MD7 but also boosts it to drive all DRAM inputs. Since the 
banks of the DRAMs are connected in parallel and the capacitance loading is additive, the 
data line must be capable of driving all the loads. Next we discuss how parity is used to 
detect RAM defects. 


Parity bit generator/checker in the IBM PC 


There are two types of errors that can occur in DRAM chips: soft error and hard 
error. In a hard error, some bits or an entire row of memory cells inside the memory chip 
get stuck to high or low permanently, thereafter always producing 1 or 0 regardless of 
what you write into the cell(s). In a soft error, a single bit is changed from 1 to 0 or from 
0 to 1 due to current surge or certain kinds of particle radiation in the air. Parity is used to 


d7... d4 de. dd Parity 
64 


Bank 3: 64K x 9 
— 64K x 4 64K x 1 
Bank 2: 64K x 9 
GAK x 4 64K x 4 
1: 256K x 9 
ek 256K x 4 256K x4 256K x | 


- 256K x 9 
adi 256K x 4 256K x 4 256K x 1 


: 64K x 4 is a single 256K-bit chip 


256K x 4 is a single 1M-bit chip 


Figure 10-18. A Possible Memory Configuration for 640K DRAM 


CHAPTER 10: MEMORY AND MEMORY INTERFACING 275 


8 multiplexed addresses to all banks 


MAO to MA7 to all banks —— 
B WE 


74LS158 


74LS158 


A4-A7_=|} A Jo 
A12-A15 =| B |e 
SaaS 


Address oG jo 


select V 


74LS245 
DO 748280 


to all banks 


D7 


MD0-MD7 MDP 
— Y to all banks 


MEMR. 
PET of 8255 


RAMADDR select ts Q 
OL4—PCK to NMI 
> 


MEMR ore 
Enable RAM PCK_\ pa MEMR=() co? 
from PB4 of 8255 


STOAMO Ae > 


TANDA PCK input to 


MEMW — >o >o 


Figure 10-19. DRAM Connection in the IBM PC 

(Reprinted by permission from “IBM Technical Reference” c. 1984 by International Business Machines Corporation) 
detect such errors. Including a parity bit to ensure data integrity in RAM is the most wide- 
ly used method since it is the simplest and cheapest. This method can only indicate if there 
is a difference between the data that was written to memory and the data that was read. It 
cannot correct the error as is the case with some high-performance computers. In those 
computers and some of the x86-based servers, the EDC (error detection and correction) 
method is used to detect and correct the error bit. The early IBM PC and compatibles use 
the 74S280 parity bit generator and checker to implement the concept of the parity bit. The 
study of that chip should help us to understand the parity bit concept. 


748280 parity bit generator and checker 


In order to understand the parity bit circuitry of Figure 10-19 it is necessary first 
to understand the 74LS280 parity bit generator and checker chip. This chip has 9 inputs 
and 2 outputs. Depending on whether an even or odd number of ones appears in the input, 
the even or odd output is activated according to Table 10-6. 

As can be seen from Table 10-6, if all 9 inputs have an even number of 1 bits, the 
even output goes high, as in cases 1 and 4. If the 9 inputs have an odd number of high bits, 
the odd output goes high, as in cases 2 and 3. The way the IBM PC uses this chip is as fol- 
lows. Notice that in Figure 10-19, inputs A-H are connected to the data bus, which is 8 
bits, or one byte. The I input is used as a parity bit to check the correctness of the byte of 
data read from memory. When a byte of information is written to a given memory loca- 
tion in DRAM, the even-parity bit is generated and saved on the ninth DRAM chip as a 


aaa 
276 


parity bit with use of control signal MEMW. This is done by activating the tri-state buffer 
using MEMW. At this point, I of the 748280 is equal to zero since MEMR is high. When 
a byte of data is read from the same location, the parity bit is gated into the I input of the 
748280 through MEMR. This time the odd output is taken out and fed into a 74LS74. If 
there is a difference between the data written and the data read, the Q output (called PCK, 
parity bit check) of the 74LS74 is activated and Q activates NMI, indicating that there is 
a parity bit error, meaning that the data read is not the same as the data written. 
Consequently, it will display a parity bit error message. For example, if the byte of data 
written to a location has an even number of 1s, A to H has an even number of lIs, and I is 
zero, then the even-parity output of 74280 becomes 1 and is saved on parity bit DRAM. 
This is case 1 shown in Table 10-6. If the same byte of data is read and there is an even 
number of 1s (the byte is unchanged), I from the ninth bit DRAM, which is 1, is input to 
the 748280, even becomes low, and odd becomes high, which is case 2 in Table 10-6. This 
high from the odd output will be inverted and fed to the 74LS74, making Q low. This 
means that Q is high, thereby indicat- 

ing that the written byte is the same as 

the byte read and that no errors Table 10-6: 74280 Parity Check 

occurred. If the number of Is in the 
byte has changed from even to odd 
and the 1 from the saved parity 
DRAM makes the number of inputs 
even (case 4 above), the odd output 
becomes low, which is inverted and 
passed to the 7474 D flip-flop. That 
makes Q = 1 and Q = 0, which sig- 
nals the NMI to display a parity bit 
error message on the screen. 


Review Questions 


1. Find the checksum byte for the following bytes: 24H, 76H, FSH, 98H, 89H, 7AH, 
6iH, COM: 

2. Show a simple program in Assembly language to find the checksum byte of the 8 
bytes of information (code or data) given in Question 1. Assume that SI equals the off- 
set address of the data. 

3. Ina given PC we have only 512K of memory on the motherboard. Show possible 
configurations and the number of chips used to add memory up to the maximum 
allowed by the limits of conventional memory if we have each of the following. 
Include the parity bit in your configuration and count. 

(a) 64K x 1 (b) 64K x 4 and 64K x 1 

4. To detect corruption of information stored in RAM and ROM memories, system 
designers use the method for RAM and the method for 
ROM. 

5. Assume that due to slight current surge in the power supply, a byte of RAM has been 
corrupted while the computer is on. Can the system detect the corruption while the 
computer is on? Is this also the case for ROM? 


CHAPTER 10: MEMORY AND MEMORY INTERFACING 277 


SECTION 10.5: 16-BIT MEMORY INTERFACING 


In the design of current x86 PCs, a single IC chip called a chipset has replaced the 
100 or so logic ICs connected together in the original PC. As a result, the details of CPU 
connection to memory and other peripherals are not visible for educational purposes. The 
16-bit bus interfacing to memory chips is one of these details that is now buried within a 
chipset but still needs to be understood. In this section we explore memory interfacing for 
16-bit CPUs. We use the 286 as an example but the concepts can apply to any 16-bit 
microprocessor. We also discuss the topics of memory cycle time and bus bandwidth. 


ODD and EVEN banks 


In a 16-bit CPU such as the 80286, memory locations 00000—FFFFF are desig- 
nated as odd and even bytes as shown in Figure 10-20. Although Figure 10-20 shows only 
1 megabyte of memory, the concept of odd and even banks applies to the entire memory 
space of a given processor with a 16-bit data bus. To distinguish between odd and even | 
bytes, the CPU provides a signal called BHE (bus high enable). BHE in association with 
AQ is used to select the odd or even byte according to Table 10-7. 


Examine Figure 10-20 to see how Table 10-7: Distinguishing Between 
the odd and even addresses are designated Odd and Even Bytes 


for the 16-bit-wide data buses. Figure 10- 
21 shows 640 KB of DRAM for 16-bit [BHE | AQ | | 
0 [0 | Even word | 


buses. Figure 10-22 shows the connection 

for the 16-bit data bus. In Figure 10-22, |0 Odd byte 
1 | 0 | Even byte _| 
] 


notice the use of AO and BHE as bank selec- 


tors. Also notice the use of the 74LS245 
chip as a data bus buffer. 


Even Bank 
(A0 = 0) 


Figure 10-21. 640K Bytes of DRAM with Odd and Even Banks Designation 


278 


to other even banks 


ter 


74LS245 256K x A 


chip select 
decoding circuitry to other odd banks 


74LS245 256K xR 


Figure 10-22. 16-bit Data Connection in the Systems with 16-bit Data Bus 
Memory cycle time and inserting wait states 


To access an external device such as memory or I/O, the CPU provides a fixed 
amount of time called a bus cycle time. During this bus cycle time, the read and write 
operation of memory or I/O must be completed. Here, we cover the memory bus cycle 
time. Bus cycle time for I/O devices is given in the next chapter. For the sake of clarity 
we will concentrate on reading memory, but the concepts apply to write operations as 
well. The bus cycle time used for accessing memory is often referred to as MC (memory 
cycle) time. The time from when the CPU provides the addresses at its address pins to 
when the data is expected at its data pins is called memory read cycle time. While in older 
processors such as the 8088 the memory cycle time takes 4 clocks, in the newer CPUs the 
memory cycle time is 2 clocks. In other words, in all x86 CPUs from the 286 to the 
Pentium, the memory cycle time is only 2 clocks. If memory is slow and its access time 
does not match the MC time of the CPU, extra time can be requested from the CPU to 
extend the read cycle time. This extra time is called a wait state (WS). In the 1980s, the 
clock speed for memory cycle time was the same as the CPU’s clock speed. For example, 
in the 20 MHz 286/386/486 processors, the buses were working at the same speed of 20 
MHz. This resulted in 2 x 50 ns = 100 ns for the memory cycle time (1/20 MHz = 50 ns). 
When the CPU’s speed was under 100 MHz, the bus speed was comparable to the CPU 
speed. In the 1990s the CPU speed exploded to 1 GHz (gigahertz) while the bus speed 
maxed out at around 133 MHz. The gap between the CPU speed and the bus speed is one 
of the biggest problems in the design of high-performance computers. To avoid the use 
of too many wait states in interfacing memory to CPU, cache memory and other high- 
speed DRAMs were invented. These are discussed in Chapter 22. 

It must be noted that memory access time is not the only factor in slowing down 
the CPU, even though it is the largest one. The other factor is the delay associated with 
signals going through the data and address path. Delay associated with reading data stored 
in memory has the following two components: 


anaes 
CHAPTER 10: MEMORY AND MEMORY INTERFACING 279 


1. The time taken for address signals to go from CPU pins to memory pins, going 
through decoders and buffers (e.g., 74LS245). This, plus the time it takes for the data 
to travel from memory to CPU, is referred to as a path delay. 

2. The memory access time to get the data out of the memory chip. This is the largest of 
the two components. 


The total sum of these two must equal the memory read cycle time provided by 
the CPU. Memory access time is the largest and takes about 80% of the read cycle time. 
See Examples 10-13 and 10-14 for further clarification of these points. These concepts 
are critical in the design of microprocessor-based products. 


Example 10-13 


Calculate the memory cycle time of a 20-MHz 8386 system with 
(a) 0 WS, 

(b) 1 WS, and 

(c) 2 WS. 

Assume that the bus speed is the same as the processor speed. 


Solution: 


1/20 MHz = 50 ns is the processor clock period. Since the 386 bus cycle time of zero wait states 
is 2 clocks, we have: 


80386 20 MHz 
Memory cycle time with 0 WS 2x 50 = 100 ns 
Memory cycle time with 1 WS 100 + 50 = 150 ns 
Memory cycle time with 2 WS 100 + 50 + 500 = 200 ns 


It is preferred that all bus activities be completed with 0 WS. However, if the read and write 
operations cannot be completed with 0 WS, we request an extension of the bus cycle time. This 
extension is in the form of an integer number of WS. That is, we can have 1, 2, 3, and so on 
WS, but not 1.25 WS. 


Example 10-14 


A 20-MHz 80386-based system is using ROM of 150 ns speed. Calculate the number of wait 
states needed if the path delay is 25 ns. 


Solution: 


If ROM access time is 150 ns and the path delay is 25 ns, every time the 80386 accesses ROM 


it must spend a total of 175 ns to get data into the CPU. A 20-MHz CPU with zero WS provides 
only 100 ns (2 x 50 ns = 100 ns) for the memory read cycle time. To match the CPU bus speed 
with this ROM we must insert 2 wait states. This makes the cycle time 200 ns (100 + 50 + 50 
= 200 ns). Notice that we cannot ask for 1.5 WS since the number of WS must be an integer. 
That would be like going to the store and wanting to buy half an apple. You must get one or 
more complete WS or none at all. 


Accessing EVEN and ODD words 


As you recall from earlier chapters, Intel defines 16-bit data as a word. The 
address of a word can start at an even or an odd number. For example, in the instruction 
“MOV AX,[2000]” the address of the word being fetched into AX starts at an even 
address. In the case of “MOV AX,[2007]” the address starts at an odd address. In systems 
with a 16-bit data bus, accessing a word from an odd addressed location can be slower. 
This issue is important and applies to 32-bit and 64-bit systems with 386 and Pentium 


ee 
280 


processors, as we will see in Chapter 22. 

As shown in Figure 10-23, in the 8-bit system, accessing a word is treated like 
accessing two bytes regardless of whether the address is odd or even. Since accessing a 
byte takes one memory cycle, accessing any word will take 2 memory cycles. In the 16- 
bit system, accessing a word with an even address takes one memory cycle. That is 
because one byte is carried on D0-D7 and the other on D8—D15 in the same memory 
cycle. However, accessing a word with an odd address requires two memory cycles. For 
example, see how accessing the word in the instruction “MOV AX,[F617]” works as 
shown in Figure 10-24. Assuming that DS = FOOOH in this instruction, the contents of 
physical memory locations FF617H and FF618H are being moved into AX. In the first 
cycle, the 286 CPU accesses location FF617H and puts it in AL. 

In the second cycle, the contents of memory location FF618H are accessed and 
put into AH. The lesson to be learned from this is to try not to put any words on an odd 
address if the program is going to be run on a 16-bit system. Indeed this is so important 
that there is a pseudo-op specifically designed for this purpose. It is the EVEN directive 
and is used as follows: 


EVEN 


VALUE1 DW È 


MC (Memory Cycle) 


Assume that DS = F000 


“MOV AL,[FF51]” Odd byte takes 1 MC 
“MOV AL,[FF52]” Even byte takes 1 MC 


“MOV AX,[FF70]” Even word takes 2 MC 


“MOV AX,[FF91]” Odd word takes 2 MC 


Figure 10-23. Accessing Even and Odd Words in the 8-bit CPU 


DS = F000 
MOV AX,[F617] 


Ist Memory Cycle (MC) 
2nd MC 


Figure 10-24. Accessing an Odd-Addressed Word in a 16-bit Processor (80286) 


pO  ————————————————————————————————— 


CHAPTER 10: MEMORY AND MEMORY INTERFACING 281 


This ensures that VALUE1, a word-sized operand, is located in an even address 
location. Therefore, an instruction such as “MOV AX,VALUE1” or “MOV VALUEI,CX” 


will take only a single memory cycle. 
Bus bandwidth 


The main advantage of the 16-bit data bus is a doubling of the rate of transfer of 
information between the CPU and the outside world. The rate of data transfer is general- 
ly called bus bandwidth. In other words, bus bandwidth is a measure of how fast buses 
transfer information between the CPU and memory or peripherals. The wider the data bus, 
the higher the bus bandwidth. However, the advantage of the wider external data bus 
comes at the cost of increasing the size of the printed circuit board. Now you might ask 
why we should care how fast buses transfer information between the CPU and outside, as 
long as the CPU is working as fast as it can. The problem is that the CPU cannot process 
information that it does not have. In other words, the speed of the CPU must be matched 
with the higher bus bandwidth; otherwise, there is no use for a fast CPU. This is like driv- 
ing a Porsche or Ferrari in first gear; it is a terrible underusage of CPU power. Bus band-> 
width is measured in MB (megabytes) per second and is calculated as follows: 


bus bandwidth = (1/bus cycle time) x bus width in bytes 


In the above formula, bus cycle time can be either memory or I/O cycle time. The 
I/O cycle time is discussed in Chapter 11. Example 10-15 clarifies the concept of bus 
bandwidth. As can be seen from Example 10-15, there are two ways to increase the bus 
bandwidth: Either use a wider data bus or shorten the bus cycle time (or do both). That is 
exactly what 386, 486, and Pentium processors have done. While the data bus width has 
increased from 16-bit in the 80286 to 64-bit in the Pentium, the bus cycle time is reach- 
ing a maximum of 133 MHz. Again, it must be noted that although the processor’s speed 
can go to 1 GHz or higher, the bus speed is limited to around 133 MHz. The reason for 
this is that the signals become too noisy for the circuit board if they are above 100 MHz. 
This is even worse for the ISA expansion slot. The ISA bus speed is limited to around 8 
MHz. This is because the ISA slot uses large and bulky connectors and they are too noisy 
for a speed of more than 8 MHz. In other words, in PCs with a 500 MHz Pentium, the 
CPU must slow down to 8 MHz when accessing the ISA bus. The PCI bus was introduced 
to solve this limitation. It can go as high as 133 MHz and its data bus width is 64-bit. 


Review Questions 


1. True or false. If AO = 0 and BHE = 1, a byte is being transferred on the D0-D7 data 
bus from an even-address location. 

2. True or false. If we have AO = 1 and BHE = 0, a byte is being transferred on the 
D0-D7 data bus from an odd-address location. 

3. True or false. If we have AO = 1 and BHE = 0, a byte is being transferred on the 
D8-D15 data bus. 

4. True or false. If we have AO = 0 and BHE = 0, a word is being transferred on the 
DO—-D15 data bus. 

5. In the instruction “MOV AX,[2000]”, the transferring of data into the accumulator 


takes memory cycles for the 8088 and for the 80286. 
6. A 16-MHz 286 has a memory cycle time of ns if it is used with a zero wait 
state. 


7. To interface a 10-MHz 286 processor to a 350-ns access time ROM, how many wait 
states are needed? 


282 


Example 10-15 


Calculate memory bus bandwidth for the following microprocessors if the bus speed is 20 MHz. 


(a) 286 with 0 WS and 1 WS (16-bit data bus ) 
(b) 386 with 0 WS and 1 WS (32-bit data bus) 


Solution: 


The memory cycle time for both the 286 and 386 is 2 clocks, with zero wait states. With the 20 
MHz bus speed we have a bus clock of 1/20 MHz = 50 ns. 


(a) Bus bandwidth = (1/(2 x 50 ns)) x 2 bytes = 20M bytes/second (MB/s) 
With 1 wait state, the memory cycle becomes 3 clock cycles 
3 x 50 = 150 ns and the memory bus bandwidth is = (1/150 ns) x 2 bytes = 13.3 MB/S 


Bus bandwidth = (1/(2 x 50 tis)) x 4 bytes = 40 MB/s 
With 1 wait state, the memory cycle becomes 3 clock cycles 
3 x 50 = 150 ns and the memory bus bandwidth is = (1/150 ns) x 4 bytes = 26.6 MB/S 


From the above it can be seen that the two factors influencing bus bandwidth are: 


1. The read/write cycle time of the CPU 
2. The width of the data bus 


Notice in this example that the bus speed of the 286/386 was given as 20 MHz. That 
means that the CPU can access memory on the board at this speed. If this 286/386 is used on a 
PC board with an ISA expansion slot, it must slow down to 8 MHz when communicating with 
the ISA bus since the maximum bus speed for the ISA bus is 8 MHz. This is done by the chipset 
circuitry. 


PROBLEMS 


SECTION 10.1: SEMICONDUCTOR MEMORIES 


1. What is the difference between a 4M memory chip and 4M of computer memory as 
far as capacity is concerned? 

2. True or false. The more address pins, the more memory locations are inside the chip. 

3. True or false. The more data pins, the more each location inside the chip will hold. 

4. True or false. The more data pins, the higher the capacity of the memory chip. 

5. True or false. With a fixed number of address pins, the more data pins, the greater the 
capacity of the memory chip. 

6. The speed of a memory chip is referred to as its 

7. True or false. The price of memory chips varies according to capacity and speed. 

8. The main advantage of EEPROM over UV-EPROM is 

9. True or false. SRAM has a larger cell size than DRAM. 

10. Which of the following, EPROM, DRAM, and SRAM, must be refreshed periodical- 
ly? 

bh. Which memory is used for cache? 

12. Which of the following, SRAM, UV-EPROM, NV-RAM, DRAM, and cache memo- 
ry, is volatile memory? 

13. RAS and CAS are associated with which memory? 
(a) EPROM (b) SRAM (c) DRAM (d) all of the above 

14. Which memory needs an external multiplexer? 


errr ee ee 
CHAPTER 10: MEMORY AND MEMORY INTERFACING 283 


(a) EPROM (b) SRAM (c) DRAM (d) all of the above 
15. Find the organization and the capacity of memory chips with the following pins. 
(a) EEPROM A0-A14, DO-D7 (b) UV-EPROM A0-A12, D0-D7 


(c) SRAM A0-A11, D0-D7 (d) SRAM A0-A12, D0-D7 
(e) DRAM A0-A10, DO (f) SRAM A0-A12, D0-D7 
(g) EEPROM A0-A11, D0-D7 (h) UV-EPROM A0-A10, D0-D7 
(i) DRAM A0-A8, D0-D3 (j) DRAM A0-A7, D0-D7 
16. Find the capacity, address, and data pins for the following memory organizations. 
(a) 16K x 8 ROM (b) 32K x 8 ROM 
(c) 64K x 8 SRAM (d) 256K x 4 DRAM 
(e) 64K x 8 ROM (f) 64K x 4 DRAM 
(g) IM x 4 DRAM (h) 4M x 4 DRAM 


(i) 64K x 8 DRAM 
SECTION 10.2: MEMORY ADDRESS DECODING 


17. Find the address range of the following memory design. 


18. Using NAND gates and inverters, design decoding circuitry for the address range 
OCO000H—OCOFFFH. 

19. Find the address range for Y0, Y3, and Y6 of the 74LS138 for the following design. 
This is the ROM interfacing with the 8088 CPU in the original PC. 


20. Using the 74138, design the memory decoding circuitry in which the memory block 
controlled by YO is in the range 00000H to 03FFFH. Indicate the size of the memo- 
ry block controlled by each Y. 

21. Find the address range for Y3, Y6, and Y7 in Problem 20. 

22. Using the 74138 and OR gates, design memory decoding circuitry in which the mem- 
ory block controlled by YO is in the 80000H to 807FFH space. Indicate the size of 
the memory block controlled by each Y. 


eee 
284 


23. 
24. 
25. 


Find the address range for Y1, Y4, and Y5 in Problem 22. 
The CS pin of the memory chip is active (low, high). What about the RD pin? 
Which one can accommodate more inputs, the 74138 or CPLD? 


SECTION 10.3: IBM PC MEMORY MAP 


26. 


Indicate the address range and total kilobytes of memory allocated to the RAM, ROM, 
and video display RAM of the PC. 


. What address range is called conventional memory, and how many K bytes is it? 

. Can we increase the size of conventional memory? Explain your answer. 

. What are the contents of CS and IP in the 8088 upon RESET? 

. A user wants to add some EPROM to a PC. Can he/she use the address range 


00000-9FFFFH? What happens if this range is used? 


. Give the logical and physical location where the BIOS ROM date is stored. 
. Suppose that the memory address range C0000H-C7FFFH is used in a certain plug- 


in adapter card. How many K bytes is that, and is this memory in the RAM or ROM 
allocated area? 


. If a video card uses only 4K bytes of VDR and the starting address is BOOOOH, what 


is the ending address of this VDR? 


. In a certain video card the starting address is B8000H and it uses only 16K bytes of 


memory. What is the ending address of this video card? 


. Why is ROM mapped where it is in the PC? Why can't we use addresses starting at 


00000? 


. When the CPU is powered up, at what physical address does it expect to see the first 


opcode? In the PC, what opcode is there normally? 


SECTION 10.4: DATA INTEGRITY IN RAM AND ROM 


at. 


38. 


397 
40. 


41. 


42. 


43. 
44. 


45. 
46. 


47. 


48. 


Find the checksum byte for the following bytes. 

34H, 54H, 7FH, 11H, E6H, 99H 

For each of the following sets of data (the last byte is the checksum byte) verify if the 
data is corrupted. 

(a) 29H, 1CH, 16H, 38H, and 6DH (b) 29H, 1CH, 16H, 30H, and 6DH 


To maintain data integrity, the checksum method is used for type mem- 
ory and the parity bit method for memory. 
True or false. ROM is tested for corruption during a cold boot-up, but data corruption 


in RAM can be detected any time the system is on. 

A given PC needs only 320K bytes to reach the maximum allowed conventional mem- 
ory. Show the memory configuration using 256K x 1 and 64K x 1 memory chips. 
How many chips are needed? (Include the parity bit.) 

Repeat Problem 41 if we have 256K x 4 and 64K x 4 chips in addition to 64 x 1 and 
256K x 1. 

Why is it preferable to use higher density memory chips in memory design? 

True or false. To access DRAM, the RAS address is provided first and then the CAS 
address. 

True or false. The 74S280 is both a parity generator and a checker. 

In the 74S280 we have 10010011 for A-H inputs and I = 0. What is the status of the 
even and odd output pins? 

In the 74S280 we have 11101001 for A-H inputs and I = 0. What is the status of the 
even and odd output pins? 

In the 748280 we have 10001001 for A-H inputs and J = 1. What is the status of the 
even and odd output pins? 


errr eer eee 
CHAPTER 10: MEMORY AND MEMORY INTERFACING 285 


SECTION 10.5: 16-BIT MEMORY INTERFACING 


49. Odd and even banks are associated with which microprocessor, the 8088 or the 
80286? 

50. How many memory chips are needed if we use 256K x 8, 256K x 1, 64K x 8, and 
64K x 1 memory chips for conventional memory of the 80286 PC/compatible? 

51. State the status of AO and BHE when accessing an odd-addressed byte. 

52. State the status of AO and BHE when accessing an even-addressed word. 

53. State the status of AO and BHE if we only want to access DO—D7 of the data bus. 

54. State the status of AO and BHE if we only want to access D8—D15 of the data bus. 

55. What is the use of 74LS245 in memory interfacing? 

56. What is the bus bandwidth unit? 

57. Give the variables that affect the bus bandwidth. 

58. True or false. One way to increase the bus bandwidth is to widen the data bus. 

59. True or false. An increase in the number of address bus pins results in a higher bus 
bandwidth for the system. i 

60. Calculate the memory bus bandwidth for the following systems. 
(a) 80286 of 10 MHz and 0 WS 
(b) 80286 of 16 MHz and 0 WS 


ANSWERS TO REVIEW QUESTIONS 


SECTION 10.1: SEMICONDUCTOR MEMORIES 


1. Nanoseconds 
. (a) 2K x 8, 16K bits (b) 8K x 8, 64K (c) 64K x 4, 256K (d) 256K x 1, 256K 

3. (a) 128K bits, 14 address, and 8 data (b) 256K, 15 address, and 8 data 
(c) 1M, 10 address, and | data (d) 1M, 9 address, and 4 data 
(e) 512K, 16, and 8 data (f) 4M, 10 address, and 4 data 

4. It takes much less time to erase and does not need to be removed from the system 
board. 

5. NV-RAM 


SECTION 10.2: MEMORY ADDRESS DECODING 
MEROE 

WE 

Active low 

8 

Active low 


at 


SECTION 10.3: IBM PC MEMORY MAP 


00000-9FFFFH, 640K bytes 

(a) BOOOOH—-BOFFFH (b) B000:0000—B000:0FFF 

256K — 92K = 164K 

CS = FFFFH, IP = 0000 

It indicates that the CPU fetches the first opcode at the physical address FFFFOH 
when the system is turned on. Therefore, no RAM can be mapped into the last seg- 
ment of the 8088; the memory space must be occupied by a cold boot ROM. 


dal el aka 


SECTION 10.4: DATA INTEGRITY IN RAM AND ROM 


1. Adding the bytes: 24H + 76H + F5H + 98H + 89H + 7AH + 61H + C2H = 44DH. 
Dropping the carries, we get 4DH, and taking the 2's complement, we have B3H for 
the checksum byte. 


eee 
286 


D MOV SI,OFFSET DATA ;LOAD THE OFFSET ADDRESS 


MOV CX,08 ;LOAD THE COUNTER 
SUBAL, AL 
E00: GADD AL,;[ Si] ;ADD THE BYTE AND IGNORE THE CARRY 
INCSI ;POINT TO NEXT BYTE 
LOOP BOO ;CONTINUE UNTIL COUNT IS ZERO 
3. Since the maximum limit is 640K bytes, we need add only 128K bytes of RAM (640 
— 512 = 128). 


(a) two banks each of nine chips of 64K x 1, total = 18 chips 

(b) two banks each of two 64K x 4 to contain data and one 64K x 1 for parity, total = 
6 chips 

Parity bit generation/checker, checksum 

While the computer is on, any corruption in the contents of RAM is detected by the 
parity bit error checking circuitry when that data is accessed (read) again. However, 
the ROM corruption is not detected since the checksum detection is performed only 
when the system is booted. 


FAES 


SECTION 10.5: 16-BIT MEMORY INTERFACING 


True 
False 
True 
Tue 

2 and 1 
125 ns 
2 WS 


SO ie teas 


ee eee rece 


CHAPTER 10: MEMORY AND MEMORY INTERFACING 287 


288 


CHAPTER 11 


8255 I/O PROGRAMMING 


OBJECTIVES 
Upon completion of this chapter, you will be able to: 


>> Code Assembly language instructions to read and write data to and 
from I/O ports 

>> Diagram the design of peripheral I/O using the 74LS373 output latch 
and the 74LS244 input buffer 

>> Describe the I/O address map of x86 PCs 

>> List the differences in memory-mapped I/O versus peripheral I/O 

>> Describe the purpose of the 8255 programmable peripheral interface 
chip 

>> Code Assembly language instructions to perform I/O through 
the 8255 

>> Code I/O programming for Microsoft Visual C/C++ 

>> Code I/O programming for Linux C/C++ 


289 


In addition to memory space, x86 microprocessors also have I/O space. This 
allows it to access ports. Ports are used either to bring data into the CPU from an external 
device such as the keyboard or to send data from the CPU to an external device such as a 
printer. In this chapter we study I/O instructions and I/O design for x86 PCs. In Section 
11.1 we discuss I/O instructions and programming. In Section 11.2 we look at ways to 
design I/O ports for 8088-based systems. In Section 11.3, the I/O map of the x86 IBM PC 
is given. The 8255 chip and its programming are discussed in Section 11.4. In addition, 
the details of an 8255 connection to the ISA/PC104 bus will be given along with I/O pro- 
gramming using C/C++. 


SECTION 11.1: 8088 INPUT/OUTPUT INSTRUCTIONS 


All x86 microprocessors, from the 8088 to the Pentium, can access external 
devices called ports. This is done using I/O instructions. The x86 CPU is one of the few 
processors that have I/O space in addition to memory space. While memory can contain, 
both opcodes and data, I/O ports contain data only. There are two instructions for this pur- 
pose: “OUT” and “IN”. These instructions can send data from the accumulator (AL or 
AX) to ports or bring data from ports into the accumulator. In accessing ports, we can use 
an 8-bit or 16-bit data port. Since 8-bit data ports in the 8088 are the most widely used, 
we will concentrate on them and introduce 16-bit data ports only in the last section of this 
chapter. 


8-bit data ports 


The 8-bit I/O operation of the 8088 is applicable to all x86 CPUs from the 8088 
to the Pentium. The 8-bit port uses the DO—D7 data bus to communicate with I/O devices. 
In 8-bit port programming, register AL is used as the source of data when using the OUT 
instruction and as the destination for the IN instruction. This means that to input or out- 
put data from any other registers, the data must first be moved to the AL register. 
Instructions OUT and IN have the following formats: 


Inputting Data OQutputting Data 
Format: IN dest, source OUT dest,source 
(1) IN AL, port# OUT port} Ar 
(2) MOV DX, port# MOV DX,port# 

IN AL, DX OUT DxX,AL 


In format (1), port# is the address of the port and can be from 00 to FFH. This 
8-bit address allows 256 input ports and 256 output ports. In this format, the 8-bit port 
address is carried on address bus AO-A7. No segment register is involved in computing 
the address, in contrast to the way data is accessed from memory. 

In format (2), the port# is also the address of the port, except that it can be from 
0000 to FFFFH, allowing up to 65,536 input and 65,536 output ports. In this case, the 16- 
bit port address is carried on address bus A0-A15, and no segment register (DS) is 
involved. This is the way Intel Corporation expanded the number of ports from 256 to 
65,536 while maintaining compatibility with the earlier 8085 microprocessors. The use of 
a register as a pointer for the port address has an advantage in that the port address can be 
changed very easily, especially in cases of dynamic compilations where the port address 
can be passed to DX. 


How to use I/O instructions 


I/O instructions are widely used in programming peripheral devices such as print- 
ers, hard disks, and keyboards. The port address can be either 8-bit or 16-bit. For an 8-bit 
port address, we can use the immediate addressing mode. The following program sends a 
byte of data to a fixed port address of 43H. 


Ee 
290 


MOV AL, 36H ;AL=36H 
OUT 43H,AL ;send value 36H to port address 43H 


The 8-bit address used in immediate addressing mode limits the number of ports 
to 256 for input plus 256 for output. To have a larger number of ports we must use the 16- 
bit port address instruction. 

To use the 16-bit port address, register indirect addressing mode must be used. 
The register used for this purpose is DX. The following program sends values 55H and 
AAH to I/O port address 300H (a 16-bit port address). In other words, the program below 
toggles the bits of port address 300H continuously. 


BACK: MOV DX, 300H ,DX = port address 300H 
MOV  AL,55H 
OUT DX, AL ;toggle the bits 
MOV AL, OAAH 
OUT DX, AL ;toggle the bits 


JMP BACK 


Notice that we can only use register DX for 16-bit I/O addresses; no other reg- 
ister can be used for this purpose. Also notice the use of register AL for 8-bit data. For 
example, the following code transfers the contents of register BL to port address 378H. 


MOV DX,378H ;DX=378 the port address 
MOV AL,BL ;load data into accumulator 
OUT DX, AL jwrite contents of AL to port 


;whose address is in DX 


To bring into the CPU a byte of data from an external device (external to the CPU) 
we use the IN instruction. Example 11-1 shows decision making based on the data that 
was input. 

Just like the OUT instruction, the IN instruction uses the DX register to hold the 
address and AL to hold the arrived 8-bit data. In other words, DX holds the 16-bit port 
address while AL receives the 8-bit data brought in from an external port. The following 
program gets data from port address 300H and sends it to port address 302H. 


MOV DX, 300H ;load port address 
IN AL, DX ;bising an data 
MOV DX, 302H 

OUT DX, AL ;send it out 


Example 11-1 


In a given 8088-based system, port address 22H is an input port for monitoring the temperature. 
Write Assembly language instructions to monitor that port continuously for the temperature of 
100 degrees. If it reaches 100, then BH should contain 'Y'. 


Solution: 


BACK: AL,22H ;get the temperature from port # 22H 
AL, 100 jis temp = 100? 
BACK ;if not, keep monitoring 
BETSY fcem> = 100, dioad "Y' imeo BH 


Review Questions 


1. Inthe x86 system, if we use only the 8-bit address bus AO—A7 for port addresses, what 
is the maximum number of (a) input, and (b) output ports? 
2. The x86 can have a maximum of how many I/O ports? 
e 
CHAPTER 11: 8255 I/O PROGRAMMING 291 


3. What does the instruction “OUT 24H,AL” do? 

4. Write Assembly language instructions to accept data input from port 300H and send 
it out to port 304H. 

5. Write Assembly language instructions to place the status of port 60H in CH. 


SECTION 11.2: I/O ADDRESS DECODING AND DESIGN 


In this section we show the design of simple I/O ports using TTL logic gates 
74LS373 and 74LS244. For the purpose of clarity we use simple logic gates such as AND 
and inverter gates for decoders. It may be helpful to review the address decoding section 
for memory interfacing in the preceding chapter before you embark on this section. The 
concept of address bus decoding for I/O instructions is exactly the same as for memory. 
The following are the steps: 

1. The control signals IOR and IOW are used along with the decoder. 
2. For an 8-bit port address, AO—A7 is decoded. 
3. Ifthe port address is 16-bit (using DX), AO—A15 is decoded. 


Using the 74LS373 in an output port design 


In every computer, whenever data is sent out by the CPU via the data bus, the data 
must be latched by the receiving device. While memories have an internal latch to grab 
the data, a latching system must be designed for simple I/O ports. The 74LS373 can be 
used for this purpose. Notice in Figure 11-1 that in order to make the 74LS373 work as a 
latch, the OC pin must be grounded. For an output latch, it is common to AND the output 
of the address decoder with the control signal IOW to provide the latching action as shown 
in Figure 11-2. See Example 11-2. 


IN port design using the 74LS244 


Likewise, when data is coming in by way of a data bus, it must come in through 
a three-state buffer. This is referred to as tristated, which comes from the term tri-state 
buffer (“tri-state” is a registered trademark of National Semiconductor Corp.). 

As was the case for memory chips, such a tri-state buffer is internal and therefore 
invisible. For the simple input ports we use the 74LS244 chip. See Figure 11-4 for the 
internal circuitry of the 74LS244. In Figure 11-4, notice that since 1G and 2G each con- 
trol only 4 bits of the 74LS244, they 
both must be activated for the 8-bit 
input. Examine Figure 11-5 to see the 
use of the 74LS244 as an entry port to 
the system data bus. Notice in Figures 
11-5 and 11-6 how the address 
decoder and the IOR control signal 
together activate the tri-state input. 

The 74LS244 not only plays 
the role of buffer, but also provides the 
incoming data with sufficient driving 


capability to travel all the way to the Enable 
CPU. Indeed, the 74LS244 chip is ORE cane 
widely used for buffering and provid- Function Table 


ing high driving capability for unidi- 
rectional buses. The 74LS245 is used 
for bidirectional buses, as seen in 
Chapter 9. 


Memory-mapped I/O 


Communicating with I/O - 
devices using IN and OUT instruc- Figure 11-1. 74LS373 D Latch 


tions is referred to as peripheral I/O. ae 1988) of Texas Instruments, Copyright 


292 


74LS373 


system 
data 


Figure 11-2. Design for “OUT 99H,AL” 
Example 11-2 
Show the design of an output port with an I/O address of 31FH using the 74LS373. 


Solution: 


31F9H is decoded, then ANDed with IOW to activate the G pin of the 74LS373 latch. This is 
shown in Figure 11-3. 


74LS373 


DO 


system 
data 
bus 


Figure 11-3. Design for Output Port Address of 31FH 

Some designers also refer to it as isolated I/O. However, there are many microprocessors, 

such as the new RISC processors, that do not have IN and OUT instructions. In such cases, 

these microprocessors use what is called memory-mapped I/O. In memory-mapped I/O, a 

memory location is assigned to be an input or output port. The following are the differ- 

ences between peripheral I/O and memory-mapped I/O in the x86 PC. 

1. In memory-mapped I/O, we must use instructions that access memory locations to 
access the I/O ports instead of IN and OUT instructions. For example, an instruction 
such as "MOV AL,[2000]" will access an input port of memory address 2000 and 
"MOV [2010],AL" will access the output port. 

2. In memory-mapped I/O, the entire 20-bit address, AO—A19, must be decoded. This is 
in contrast to peripheral I/O, in which only AO—-A15 are decoded. Furthermore, since 
the 20-bit address involves both the segment and an offset, the DS register must be 
loaded before memory-mapped I/O is accessed. For example, if physical memory 


ee reer eee ree een 
CHAPTER 11: 8255 I/O PROGRAMMING 293 


address 35000H is used for the input port, the fol- 
lowing instructions can be used to access the port. 


MOV AX,3000H ;load the segment value 
MOV DS,AX 
MOV AL,[ 5000] ;get a byte from loc. 35000H 


Physical address 35000H is generated by shifting 
left DS one hex digit and adding it to offset address 
5000 (30000 + 5000 = 35000H). Since all 20-bit 
addresses are decoded, the decoding circuitry for 
memory-mapped I/O is more expensive. 

3. In memory-mapped I/O circuit interfacing, control 
signals MEMR and MEMW are used. This is in 
contrast to peripheral I/O, in which IOR and IOW 
are used. 

4. In peripheral I/O we are limited to 65,536 input 
ports and 65,536 output ports, whereas in memory 
I/O the number of ports can be as high as 220 
(1,048,576). Of course, that many ports are never 
needed. Figure 11-4. 74LS244 Octal Buffer 

5. In memory-mapped I/O, one can perform arithmetic eee ee of re, 
and logic operations on I/O data directly without ki ae a ) 
first moving them into the accumulator. In memory-mapped F/O, data can be trans- 
ferred into any register, rather than into the accumulator. 

6. One major and severe disadvantage of memory-mapped I/O is that it uses memory 
address space, which could lead to memory space fragmentation. 


Example 11-3 
Show the design of “IN AL,9FH” using the 74LS244 as a tri-state buffer. 


Solution: 


9FH is decoded, then ANDed with IOR. To activate OC of the 74LS244, it must be inverted 
since OC is an active-low pin. This is shown in Figure 11-5. 


Review Questions 


1. Designers use a (latch, tri-state buffer) for output and a 
(latch, tri-state buffer) for input. 


2. Why do we use latches in I/O design? 

3. Why is the 74LS373 called the transparent latch? 

4. To use the 74LS373 as a latch, OC must be set to permanently. 

5. True or false. To access the maximum number of ports in the x86, we must decode 
addresses A0-A 15. 

6. In memory-mapped I/O, which signal is used to select the (a) output, and (b) input 


devices? 
SECTION 11.3: I/O ADDRESS MAP OF x86 PCs 


Designers of the original IBM PC decided to make full use of I/O instructions. 
This led to assignment of different port addresses to various peripherals such as LPT and 
COM ports, and other chips and devices. The list of the designated I/O port addresses is 
referred to as the //O map. Table 11-1 shows the I/O map for the x86 PC. A much more 
detailed I/O map of the x86 PC see Appendix E. Any system that needs to be compatible 
with the x86 IBM PC must follow the I/O map of Table 11-1. For example, the map shows 
that we can use I/O address 300-31F for a prototype card. 


eee 
294 


74LS244 


. to D0-D7 
switches of syai 
data bus 


AO 
system 
address 
bus 

A7 


OR 


Figure 11-5. Design for “IN AL,9FH” 


74LS244 


SO DO 
switches D4 to system 
data bus 
7. n D7 
AO b= zR 
1G 2G 


system 
address 


bus 
A7 l 


Figure 11-6. Input port design for “IN AL,SFH” 
Absolute vs. linear select address decoding 


In decoding addresses, either all of them or a selected number of them are decod- 
ed. If all the address lines are decoded, it is called absolute decoding. If only selected 
address pins are used for decoding, it is called linear select decoding. Linear select is 
cheaper, since the less input there is, the fewer the gates needed for decoding. The disad- 
vantage is that it creates what are called aliases, the same port with multiple addresses. In 
cases where linear select is used, we must document port addresses in the I/O map thor- 
oughly. In the first IBM PC, linear select decoding was used and that resulted in large 
numbers of address aliases, as we will see in future chapters. If you see a large gap in the 
I/O address map of the x86 PC, it is due to the address aliases of the original PC. 


Prototype addresses 300-31FH in x86 PC 


In the x86 PC, the address range 300H-31FH is set aside for prototype cards to 
be plugged into the expansion slot. These prototype cards can be data acquisition boards 
used to monitor analog signals such as temperature, pressure, and so on. Interface cards 
using the prototype address space use the following signals on the 62-pin section of the 
ISA expansion slot: 

1. IOR and IOW. Both are active low. 
2. AEN signal: AEN = 0 when the CPU is using the bus. 
3. AO-A9 for address decoding. 


a 
CHAPTER 11: 8255 I/O PROGRAMMING 295 


Hex Range 


000-01F 
020-03F 
040—05F 
060—06F 
070—07F 
080—-09F 
0A0-0BF 
0C0-ODF 
OFO 

OF1 
OF8—OFF 
1F0-1F8 
200-207 
20C-20D 
21F 
218—27F 
2B0-2DF 
2E 


2E2 & 23 


2F8-2FF 
300-31F 
360-363 
364-367 
368-36B 
36C-36F 
378-37F 
380-38F 
I=; 
3A0-3AF 
3B0-3BF 
2C0Z3CE 
3D0-3DF 
SEOS SET 
3F8—3FF 


6E2 & 6E3 


790-793 


AE2 & AE3 


B90-B93 


Device 

DMA controller 1, 8237A-5 
Interrupt controller 1, 8259A, Master 
Timer, 8254-2 

8042 (keyboard) 

Real-time clock, NMI mask 
DMA page register, 74LS612 
Interrupt controller 2, 8237A-5 
DMA controller 2, 8237A-5 
Clear math coprocessor busy 
Reset math coprocessor 

Math coprocessor 

Fixed disk 

Game I/O 

Reserved 

Reserved 

Parallel printer port 2 
Alternate enhanced graphics adapter 
GPIB (adapter 0) 

Data acquisition (adapter 0) 
Serial port 2 

Prototype card 

PC network (low address) 
Reserved 

PC network (high address) 
Reserved 

Parallel printer port 1 

SDLC, bisynchronous 2 
Cluster 

Bisynchronous 1 

Monochrome display and printer adapter 
Enhanced graphics adapter 
Color/graphics monitor adapter 
Disk controller 

Serial port 1 

Data acquisition (adapter 1) 
Cluster (adapter 1) 

Data acquisition (adapter 2) 
Cluster (adapter 2) 


EE2 & EE3 Data acquisition (adapter 3) 
1390-1393 Cluster (adapter 3) 
2261 GPIB (adapter 1) 
2390-2393 Cluster (adapter 4) 

GPIB (adapter 2) 

GPIB (adapter 3) 

GPIB (adapter 4) 

GPIB (adapter 5) 

GPIB (adapter 6) 

GPIB (adapter 7) 


Table 11-1: /O Map for the x86 PC 


aaae 
296 


Use of simple logic gates as address decoders 


Figure 11-7 shows the circuit design for a 74LS373 latch connected to port 
address 300H of an x86 PC via an ISA expansion slot. Notice the use of signals AOQ—A9 
and AEN. AEN is low when the x86 microprocessor is in control of the buses. After all, 
it is the job of the CPU to control all the peripheral devices and not the DMA. In Figure 
11-7, we are using simple logic gates such as NAND and inverter gates for the I/O address 
decoder. These can be replaced with the 74LS138 chip because the 74LS138 is a group of 
NAND gates in a single chip. 


74LS373 


from 
buffered 
data bus 


from 
expansion 
slot 


Figure 11-7. Using Simple Logic Gate for I/O Address Decoder (I/O Address 300H) 
Use of 74LS138 as decoder 


In current system board design, CPLD 
(complex programmable logic device) chips are 
used for supporting logics such as decoders. In the 
absence of CPLD, one could use NANDs, invert- 
ers, and 74LS138 chips for decoders as we saw in 
the preceding chapter for memory address decod- 
ing. The same principle applies to I/O address 
decoding. Figure 11-8 shows the 74LS138. To see 
an example of the use of a 74LS138 for an I/O 
address decoder, examine Figure 11-9. Notice how 
each Y output can control a single device. Figure 
11-9 shows the address decoding for an input port 
located at address 304H. The Y4 output, together 
with the IOR signal, controls the 74LS244 input Enable 
buffer. Alternatively, YO along with the IOW con- 
trol signal could be used to control a 74LS373 Function Table 
latch. In other words, each Y output controls a sin- 
gle I/O device. Contrast that with Figure 11-7. The 
74LS138 is much more efficient than the use of |C BA] YOYIY2Y3Y4Y5Y6Y7 
simple logic gates as decoders. HHHHHHHHA 


HHHHHHHH 
IBM PC I/O address decoder 


LHHHHHHH 

Figure 11-10 shows a 74LS138 chip used HLHHHHHH 

as an I/O address decoder in the original IBM PC. HHLHHHHH 

Notice that while AO to A4 go to individual periph- HHHLHHHH 

eral input addresses, A5, A6, and A7 are responsi- HHHHAHLHHH 

ble for the output selection of outputs YO to Y7. In HHHHHLHH 

order to enable the 74LS138, pins A8, A9, and HHHHHHLH 
AEN all must be low. While A8 and A9 will direct- 


Block Diagram 


Se ei tee ei pe 


Pee ee E ENE e E 


HHHHHHHL 


ly affect the port address calculations, AEN is low Figure 11-8. 74LS138 Decoder 


only when the x86 is in control of the system bus. (Reprinted by permission of Texas Instruments, 
Copyright Texas Instruments, 1988) 


A 
CHAPTER 11: 8255 I/O PROGRAMMING 297 


Since all the peripherals are programmed by the x86, AEN is low during CPU activity. See 
Table 11-2. We will discuss each Y output of 74LS138 in Figure 11-8 in subsequent chap- 
ters. 


74LS244 


XDO 


switches buffered 
data bus 


74LS138 


74LS138 


to 8237 CS (00-0FH) 
to 8259 CS (10-1FH) 
to 8253 CS (40-4FH) 
to 8255 CS (60-6FH) 


writing to 
DMA page 
register 
writing into 
not used NMI register 
not used 


low. AEN = 0 when CPU in charge of buses 


Figure 11-10. Port Address Decoding in the Original IBM PC (PC/XT) 
Table 11-2: Port Addresses Decoding Table on the Original PC 


G2A | G2B |C BA 
A9 A8 A7 A6 A5 A4 A3 A2 Al AO 
0 


OO. aL 
l ae ea fies ee | 


Port 61H and time delay generation 


00 Lowest port address 
FF Highest port address 


Appendix E provides a detailed I/O map of the PC. In order to maintain compat- 
ibility with the IBM PC and run operating systems such as MS-DOS and Windows, the 
assignment of I/O port addresses must follow the standard set in Appendix E. Port 61H is 
a widely used port, the details of which are shown in Appendix E. We can use this port to 
generate a time delay, which will work in any PC with any type of processor from the 286 
to the Pentium. I/O port 61H has eight bits (D0—D7). Bit D4 is of particular interest to us. 
In all 286 and higher PCs, bit D4 of port 61H changes its state every 15.085 microseconds 
(us). In other words, it stays low for 15.085 us and then changes to high and stays high 
for the same amount of time before it goes low again. This toggling of bit D4 goes on 
indefinitely as long as the PC is on. Chapter 13 provides more details on this topic. 


— 
298 


The following program shows how to use port 61H to generate a delay of 1/2 sec- 


ond. In this program all the bits of port 310H are toggled with a 1/2 second delay in 
between. 


,TOGGLING ALL BITS OF PORT 310H EVERY 0.5 SEC 
MOV DA, 3108 


HERE: MOV AI 55H "toggle all bits 
OUT DX, AL 
MOV CX ool 44 ;delay=33144x15.085 us=0.5 sec 


CALL TDELAY 
MOV AL, OAAH 
OUT DX, AL 
MOV CX, 33144 
CALL TDELAY 
JMP HERE 


7;CX=COUNT OF 15.085 MICROSEC 


TDELAY PROC NEAR 
PUSH AX ;save AX 
W1: IN AL, 61H 


AND AL,00010000B 
CMP AL, AH 


JE Wl walt for 15.085 usec 
MOV AH, AL 
LOOP Wil yanother 15.085 usec 
ROE AX ,restore AX 
RET 

TDELAY ENDP 


In the above program, notice that when port 61H is read, all the bits are masked 
except D4. The program waits for D4 to change every 15.085 us before it loops again. 


Review Questions 


1. What I/O address range is set aside for prototype cards? 
In the x86 PC, give the status of the AEN signal when I/O ports are being addressed 
by the CPU. 

3. In decoding addresses for I/O instructions, why do we need to include AEN, and what 
is the activation level? 

4. Which bit of port 61H toggles every 15.085 us? 

5. Calculate the time delay if CX = 25,000 for Question 4. 


SECTION 11.4: PROGRAMMING AND INTERFACING THE 
8255 


In this section we study the 8255 chip, a widely used I/O chip. The 8255 is a 40- 
pin DIP chip (see Figure 11-11). It has three separately accessible ports, A, B, and C, 
which can be programmed, hence the name PPI (programmable peripheral interface). 
Each port of the 8255 can be programmed to be input or output. They can also be changed 
dynamically, in contrast to the 74LS244 and 74LS373, which are hard-wired. 


Port A (PAO-PA7) 

This 8-bit port A can be programmed all as input or all as output. 
Port E (PBO-PB7) 

This 8-bit port B can be programmed all as input or all as output. 


eee ene 


CHAPTER 11: 8255 I/O PROGRAMMING 299 


Port C (PCO-PC7) 


This 8-bit port C can be all 
input or all output. It can also be split 
into two parts, CU (upper bits 
PC4-PC7) and CL (lower bits 
PCO—PC3). Each can be used for input 
or output. Any of PCO to PC7 can be 
programmed individually. 


RD and WR 


These two active-low control 
signals are inputs to the 8255. If the 
8255 is using peripheral I/O design, 
IOR and IOW of the system bus are 
connected to these two pins. If the port 
uses memory-mapped I/O, MEMR and 
MEMW activate them. 


RESET 


This is an active-high signal 
input into the 8255 used to clear the 
control register. When RESET is acti- 
vated, all ports are initialized as input 
ports. This pin must be connected to 
the RESET output of the system bus or 
ground, making it inactive. 


A0, A1, and CS 


While CS (chip select) selects 
the entire chip, address pins AO and A1 
select the specific port within the 8255. 
These three pins are used to access 
ports A, B, C, or the control register, as 
shown in Table 11-3. 


Mode selection of the 8255A 


While ports A, B, and C are 
used for I/O data, it is the control reg- 
ister that must be programmed to 


Figure 11-11. 8255 PPI Chip 
(Reprinted by permission of Intel Corporation, 
Copyright Intel, 1983) 


Table 11-3: 8255 Port Selection 


Control register 
8255 is not selected 


(Reprinted by permission of Intel Corporation, 
Copyright Intel Corp. 1983) 


select the operation mode of the three ports A, B, and C. The ports of the 8255 can be 
programmed in various modes, as shown in Figure 11-12. Mode 0, the simple I/O mode, 
is the most widely used mode and the only one we are concerned with. In this mode, any 
of the ports A, B, CL, and CU can be programmed as input or output. In this mode, all bits 
are out or all are in. In other words, there is no control of individual bits. Examples 11- 
4, 11-5, and 11-6 show programming of 8255 ports in simple I/O mode. 


Although the ISA bus has been replaced with the PCI bus, the ISA bus specification 
lives on in PC104 form. See Chapter 26. For more information on PC104 see: 


http://www.pc104.org 


300 


ns ee ee 
E 


Group A Group B 


Port C 
(Upper: PC7-PC4) 


Port C 
(Lower: PC3—PC0) 


1 = input; 0 = output 


1 = input; 0 = output 


Port A Port B 
1 = input; 0 = output 1 = input; 0 = output 


Mode Selection Mode Selection 
00 = Mode 0 0 = Mode 0 
01 = Mode 1 1 = Mode 1 
1x = Mode 2 


1 = I/O Mode 
0 = BSR Mode 


Figure 11-12. 8255 Control Word Format (I/O Mode) 
(Reprinted by permission of Intel Corporation, Copyright Intel, 1983) 


Example 11-4 


(a) Find the control word if PA = out, PB = in, PCO—PC3 = in, and PC4-PC7 = out. 
(b) Program the 8255 to get data from port A and send it to port B. In addition, 
data from PCL is sent out to the PCU. 

Use port addresses of 300H—303H for the 8255 chip. 


Solution: 
(a) From Figure 11-12 we get the control word of 1000 0011 in binary or 83H. 
(b) The code is as follows: 


BOZ556 EQU 300H @; Base address of 8255 chip 

CNTL ON fet SHB ; PA=out, PB=in, PCL=in, PCU=out 

MOV DX,B8255C+3; load control reg. address 
(S00 ae Sh = Soest) 

MOV AL, CNTL ;load control byte 

OUT DX, AL ¿send it to control register 

MOV DX,B8255C+1 ;load PB address 

IN AL, DX ;get the data from PB 

MOV DX, BEASSC ;jload PA address 

OUT DX, AL ;send it to PA 

MOV IDS, JAZ S5Csk2 pilose: IWC ACLS 

IN AL, DX ;get the bits from PCL 

AND AL, OFH ;mask the upper bits 

ROL AL, 1 

ROL AL, 1 ;shift the bits 

ROL AL, 1 ;to upper position 

ROL AL, 1l 

OUT DX, AL ;send it to PCU 


Lene errr een ee ———————EEEE———=== 
CHAPTER 11: 8255 I/O PROGRAMMING 301 


The 8255 shown in Figure 11-13 is configured as follows: port A as input, B as output, and all 
the bits of port C as output. 

(a) Find the port addresses assigned to A, B, C, and the control register. 

(b) Find the control byte (word) for this configuration. 

(c) Program the ports to input data from port A and send it to both ports B and C. 


Solution: 
(a) The port addresses are as follows: 


(os Al 2A 
11 0001 00 
11 0001 00 
11 0001 00 
11 0001 00 


Address 
310H 
311H 
312H 
313H 


Port 

Port A 

Port B 

Port C 

Control register 


(b) The control word is 90H, or 1001 0000. 

(c) One version of the program is as follows: ' 
MOV AL, 90H ;control byte PA=in, PB=out, PC=out 
MOV IDM, SIS! ;load control reg address 
OUT 


MOV 
IN 

MOV 
OUT 
MOV 


DX Ab 
DX, 310H 
Al Dx 
DX, 311H 
D AL 
DX, 312H 


¿send it 
;load PA 
;get the 
; load PB 
send it 
¿load PC 


to control register 
address 

data from PA 
address 

CO EBB 

address 


OUT DHAL ;and to PC 

Using the EQU directive one can rewrite the above program as follows: 
CNTLBYTE EQU 90H ;PA=in, PB=out, PC=out 

PORTA EQU  310H 

PORTB EQU  311H 

PORTC EQU 312H 

CNTLREG EQU  313H 


MOV 
MOV 


AL, CNTLBYTE 
DX, CNTLREG 
OUT DX, AL 

MOV DX, PORTA 

IN AL, DX 

andi so on. 


Figure 11-13. 8255 Configuration for Example 11-5 


i T 
302 


Example 11-6 


Show the address decoding where port A of the 8255 has an I/O address of 300H, then write a 


program to toggle all bits of PA continuously with a 1/4 second delay. Use INT 16H to exit if 
there is a keypress. 


Solution: 

The address decoding for the 8255 is shown in Figure 11-14. The control word for all ports as 
output is 80H. The program below will toggle all bits of PA indefinitely with a delay in between. 
To prevent locking up the system, we press any key to exit to DOS. 


MOV DxX,303H ;CONTROL REG ADDRESS 
MOV AL,80H ;ALL PORTS AS OUTPUT 
eur Dx, AL 

MOV DxX,300H 

MOV AL,55H 

OUT DX AL 

CALL QSDELAY 31/4 SEC DELAY 

MOV AL, OAAH ;TOGGLE BIT 

Our EDA 

CALL QSDELAY 

MOV AH,O1 

INT . 168 ;CHECK KEYPRESS 

J7 AGAIN ; PRESS ANY KEY TO EXIT 
MOV  AH,4CH 

INT 21H ;EXTT 


OSDELAY PROC NEAR 
MOV Ox, LOS 7A 716,572x15.085 usec=1/4 sec 
PUSH AX 
IN AL, 61H 
AND AL, 00010000B 
CMP AL, AH 
JE w1 
MOV AH, AL 
LOOP W1 
POP AX 
RET 
OSDELAY ENDP 


Notice the use of INT 16H option AH = 01 where the keypress is checked. If there is no key- 
press, it will continue. We must do that to avoid locking up the x86 PC. 


Figure 11-14. 8255 Configuration for Example 11-6 


CHAPTER 11: 8255 I/O PROGRAMMING 303 


Buffering 300—31FH address range 


When accessing the system bus via the expansion slot, we must make sure that the 
plug-in card does not interfere with the working of system buses on the motherboard. To 
do that we isolate (buffer) a range of I/O addresses using the 74LS245 chip. In buffering, 
the data bus is accessed only for a specific address range, and access by any address 
beyond the range is blocked. Figure 11-15 shows how the I/O address range 300H-31FH 
is buffered with the use of the 74LS245. Figure 11-16 shows another example of 8255 
interfacing using the 74LS138 decoder. As shown in Figure 11-16, YO and Y1 are used for 
the 8255 and 8253, respectively. Table 11-4 shows the 74LS138 address assignment. 


74LS245 
DO XDO 


System 
fered 
data bus from buffere 


expansion slot 


data bus 


D7 XD7 


Figure 11-15. Buffering I/O Address Range 300-31FH 


Table 11-4: Address Assignment for Figure 11-16 


Assignment 
Used by 8255 
Used by 8253 
Available 
Available 
Available 


A9 


Figure 11-16. Interface Decoding Circuitry 


304 


Figure 11-17 shows the circuit for buffering all the buses. Notice the use of the 
74LS244 to boost the address and control signals. This ensures the integrity of the signal 
transmitted to the plug-in board. 


74LS245 


To 

Cable 

Connector 
DIR OC < 


74LS244 


To 
Cable 
Connector 


74LS244 


To 
Cable 
Connector 


74LS138 


in board 


To Cable Connector 


GND = pins B1, B31 


Figure 11-17. Design of the 8-bit ISA PC Bus Extender 


Example 11-7 shows a test program to toggle the PA and PB bits. Notice that in 
order to avoid locking up the system, we use INT 16H to exit upon pressing any key. You 
can modify Example 11-7 to toggle all bits of PA, PB, and PC. Make sure to put a mes- 
sage on the x86 PC screen to prompt the user to exit by pressing any key. 


a 
CHAPTER 11: 8255 I/O PROGRAMMING 305 


Write a program to toggle all bits of PA and PB of the 8255 chip on the PC Trainer. Put a 1/2 
second delay in between “on” and “off” states. Use INT 16H to exit if there is a keypress. 


Solution: 
The program below toggles all bits of PA and PB indefinitely. Pressing any key exits the pro- 


gram. 


MOV DX, 303H ;CONTROL REG ADDRESS 
MOV AL,80H ;ALL PORTS AS OUTPUT 
OUT T DR AL 
MOV DX,300H ;PA ADDRESS 
MOV AL, 55H 
OUT. * Bx AN 
INC DX ;PB ADDRESS 
our DX AL 
CALL HSDELAY 31/2 SEC DELAY 
MOV DX, 300H ;PA ADDRESS 
MOV ‘AL, OAAH 
OUT DEAL 
INC DX ;PB ADDRESS 
Cure. DX, AD 
CALL HSDELAY al SEC DELAY 
MOV AH,01 
INT 16H ; CHECK KEYPRESS 
JZ AGAIN ; PRESS ANY KEY TO EXIT 
MOV AH,4CH ; 
INT 21H EXIT 


HSDELAY PROC NEAR 
MOV CX, 33144 733144x15.085 usec=1/2 sec 
PUSH < AX 
IN AL, 61H 
AND AL,00010000B 
CMP AL, AH 
JE W1 
MOV AH, AL 
LOOP W1 
EOE AX 
RET 
HSDELAY ENDP 


Notice the use of INT 16H option AH = 01 where the keypress is checked. If there is no key- 
press, it will continue. 


Visual C/C++ I/O programming 


Microsoft Visual C++ is a programming language widely used on the Windows 
platform. Since Visual C++ is an object-oriented language, it comes with many classes 
and objects to make programming easier and more efficient. Unfortunately, there is no 
object or class for directly accessing I/O ports in the full Windows version of Visual C++. 
The reason for that is that Microsoft wants to make sure the x86 system programming is 
under full control of the operating system. This precludes any hacking into the system 
hardware. This applies to Windows NT, 2000, XP, and higher. In other words, none of the 
system INT instructions such as INT 21H and I/O operations that we have discussed in 
previous chapters are applicable in Windows XP and its subsequent versions. To access 
the I/O and other hardware features of the x86 PC in the XP environment you must use 
the Windows Platform SDK provided by Microsoft. The situation is different in the 
Windows 9x (95 and 98) environment. While INT 21H and other system interrupt instruc- 


ae ooo 
306 


tions are blocked in Windows 9x, direct I/O addressing is available. To access I/O direct- 
ly in Windows 9x, you must program Visual C++ in console mode. The instruction syn- 
tax for I/O operations is shown in Table 11-5. Notice the use of the underscore character 
(_) in both the _outp and _inp instructions. It must also noted that while the x86 
Assembly language makes a distinction between the 8-bit and 16-bit I/O addresses by 
using the DX register, there is no such distinction in C programming, as shown in Table 
11-5. In other words, for the instruction “outp(port#,byte)” the port# can take any address 
value between 0000 and FFFFH. See Examples 11-8 and 11-9. 


Table 11-5: I/O Operations in Microsoft Visual C++ (for Windows 98 ) 


Visual C++ 
_outp(port#,byte 
_outp(port#, byte 
_inp(port# 


_inp(port#) 


Example 11-8 


Write a Visual C++ program for Windows 98 to toggle all bits of PA and PB of the 8255 chip. 
Use the kbhit function to exit if there is a keypress. 
Solution: 
//Tested by Dan Bent 
#include<conio.h> 
#include<stdio.h> 
#include<iostream.h> 
#include<iomanip.h> 
#include<windows.h> 
void main () 
{ 
cout<<setiosflags(ios::unitbuf); // clear screen buffer 
cout<<"This program toggles the bits for Port A and Port B."; 
oute(0x303;, 0x80) ; //MAKE PA,PB of 8255 ALL OUTPUT 
do 
{ 
COuEpdox<s00,0x55) ; //SEND 55H TO PORT A 
~ outp (0x301, 0x55) ; //SEND 55H TO PORT B 
sleep (500); //DELAY of 500 msec. 
~ outp (0x300, OxAA) ; //NOW SEND AAH TO PA, and PB 
_outp (0x301, OxAA) ; 
sleep (500); 
} 
while(!kbhit()); 


————— ss. Ic, 


CHAPTER 11: 8255 I/O PROGRAMMING 307 


Example 11-9 


Write a Visual C++ program for Windows 98 to get a byte of data from PA and send it to both 

PB and PC of the 8255 chip in PC Trainer. 

Solution: 

#include<conio.h> 

#include<stdio.h> 

#include<iostream.h> 

#include<iomanip.h> 

#include<windows.h> 

#include<process.h> 

//Tested by Dan Bent 

void main () 

{ 
unsigned char mybyte; 
cout<<setiosflags(ios::unitbuf);// clear screen buffér 

syetem("Cing™) ; 

Sti EOSO, 0x90) ; //PA=in, PB=out, PC=out 
_sleep(5); //wait 5 milliseconds 
mybyte="inp (0x300) ; //get byte from PA 
_outp(0x301,mybyte); //send to PB 
estesr(5) ; 

_outp (0x302,mybyte); //send to Port C 

LENSSD (5): 

cout<<mybyte; //send to PC screen also 
comt<<"eniwn 


I/O programming in Linux C/C++ 


Linux is a popular operating system for the x86 PC. You can get a copy of the 
latest C/C++ compiler from http://gcc.gnu.org. Table 11-6 provides the C/C++ syntax for 
I/O programming in the Linux OS environment. 


Table 11-6: Input/Output Operations in Linux 
Linux C/C++ 


outb(byte,port# 
outb(byte, port# 


inb(port# 
inb(port#) 


Compiling and running Linux C/C++ programs with I/O functions 


To compile the I/O programs of Examples 11-10 and 11-11, the following points 

must be noted. 
1. To compile with a keypress loop, you must link to library ncurses as follows: 

> gcc -Incurses toggle.c -o toggle 
2. To run the program, you must either be root or root must change permissions on exe- 

cutable for hardware port access. 

Example: (as root or superuser) 

> chown root toggle 

> chmod 4750 toggle 


Now toggle can be executed by users other than root. More information on this topic 
can be found at www.microdigitaled.com. 


SEER 
308 


Example 11-10 


Write a C/C++ program for a PC with the Linux OS to toggle all bits of PA and PB of the 8255 
chip on the PC Trainer. Put a 500 ms delay between the “on” and “off” states. Pressing any key 
should exit the program. 


Solution: 


le 


This program demonstrates low level I/O 


// using C language on a Linux based system. 
Li Tested by Nathan Noel ii 
#include <stdio.h> // LOmeor ince () 


#include <unistd.h> // for usleep() 
#include <sys/io.h> M ker oudot anc adal) 
#include <ncurses.h> // for console i/o functions 
at menka () 

{ 
int n=0; // temp char variable 
int delay=5 e5; // sleep delay variable 


ioperm(0x300,4,0x300); // get port permission 
outhb (0x80, 0x303); // send control word 


Gl Pegi neurses SCD === 
//--- (needed for console i/o) ------- 


imnveser (): j// Anvedalivze screen “for ncurses 


coeedk () ; // do not wait for carriage return 
noecho (); // do not echo input character 
halfdelay (1); // only wait for Gigs for input 


// from keyboard 
Sia) snd neurses Sarup ===- 


// main toggle loop 


Primer tlxS r, // display status to screen 
refresh(); // vefresh() to update console 

ouep (0S5, 0x300); // send 0x55 to BertA (010001018) 
Guto(0zo5,0x301); // send 0x55 to PertB (O10T010IB) 


usleep (delay) ; // wait for 500ms (5 e5 microseconds) 
printf ("0xAA \n\r"); // display status to screen 
refresh (); // vefresh() to update console 


outb (0xaa,0x300); // send OxAA to PortA (10101010B) 
outb (Oxaa,0x301); // send OxAA to PortB (10101010B) 
usleep (delay) ; // wait for 500ms 

// get input from keyboard 
n=getch (); // if no keypress in lms, n=0 

// dae to halfdelay() 


} 
while (n<=0) ; // test for keypress 


// if keypress, exit program 
endwin(); // close program console for ncurses 
return 0; // exit program 


} 


ee ee ee eee eee eee eee eee SOoEoEooESESES>S>SS>SEEEEEEE===E=a7E 
CHAPTER 11: 8255 I/O PROGRAMMING 309 


Example 11-11 


Write a C/C++ program for a PC with the Linux OS to get a byte of data from port A and 
send it to both port B and port C of the 8255 in the PC Trainer. 


Solution: 


y This program gets data from Port A and 
Mi sends a copy to both Port B ana Port C. 
Tested by: Nathan Noel -- 2/10/2002 


7 NelLuidesstdie. h> 

fi nelude <unistd.h> 
#include <sys/io.h> 
mme hude <ncursessh> 


mwema () 
{ 
int n=0; // temp variable 
int i=0; // temp variable 


ioperm (0x300,4,0x300);// get permission to use ports 
outbi(0x90,0x303); am send control word smo: 
// PortA=input, PortB=output, PortC=output 


Poe ser (); // initialize screen for ncurses 
cbreak (); // do not wait for carriage return 
noecho(); // do not echo input character 
halfdelay (1); // only wait for ims. fer input 


do // main toggle loop 
{ 
Do 0208.00); get data from PortA 
usleep(1le5); sleep for 100ms 


rie Eel CeO deg send data to PortB 
ouro ar 0x302; send data to PortC 


n=getch (); get input from keyboard 

if no keypress in lms, n=0 
} while (n<=0); test for keypress 

if keypress, exit program 


endwin(); close program window 
Tecin (0) A exit program 


} 


310 


Review Questions 


1. Find addresses for all 8255 ports if A7-A2 = 111101 is used to activate CS. 

2. Find the control word for an 8255 in mode 0 (simple I/O) if all the ports are config- 
ured as output ports. 

3. Find the control word for an 8255 in mode 0 (simple I/O) if all the ports are config- 
ured as input ports. 

4. Program an 8255 with the following specifications: All ports are output ports. Write 
55H to the ports. After a delay, switch them all to AAH. 

5. How are ports configured after the control register is loaded with 89H? 


PROBLEMS 


SECTION 11.1: 8088 INPUT/OUTPUT INSTRUCTIONS 


True or false. While memory contains both code and data, ports contain data only. 
In the instruction “OUT 99H,AL”, the port address is: 

(a) 8 bits (b) 16 bits (c) both (a) and (b) (d) none of the above 

3. In the instruction “OUT DX,AL”, the port address is: 

(a) 8 bits (b)16bits (c) either (a) or (b) (d) none of the above 

True or false. In the instruction “IN AL,78H”, register AL is the destination. 
Explain what the instruction “IN AL,5FH” does. 

In the instruction “OUT DX,AL”, assume that AL = 3BH and DX = 300H. Explain 
what the instruction does. 


NO — 


Se ae 


SECTION 11.2: I/O ADDRESS DECODING AND DESIGN 


7. Inthe execution of an OUT instruction, which control signal is activated? 

8. In the execution of an IN instruction, which control signal is activated? 

9. True or false. Segment register DS is used to generate a port's physical address. 

10. True or false. In “OUT 65H,AL”, only address pins AO—A7 are used by the 8088 to 
provide the address. 

11. True or false. An input port is distinguished from an output port by the port address 
assigned to it. 

12. True or false. An input port is distinguished from an output port by the IOR and IOW 
control signals. 

13. A (latch, tri-state buffer) is used in the design of input ports. 

14. A (latch, tri-state buffer) is used in the design of output ports. 

lS (IOR, IOW) is used in the design of input ports. 

16. (IOR, IOW) is used in the design of output ports. 

17. Draw a logical design for “OUT 16H,AL” using AND and inverter gates in addition 
to a 74LS373. 

18. Draw a logical design for “IN AL,81H” using AND and inverter gates in addition to 
a 74LS244. 

19. Show one implementation of Problem 17 using NAND and inverter gates. Use as 
many as you need. 

20. Show one implementation of Problem 18 using NAND and inverter gates. Use as 
many as you need. 

21. True or false. Memory-mapped I/O uses control signals MEMR and MEMW. 

22. True or false. In memory-mapped I/O, one can perform logical and arithmetic opera- 
tions on the data without moving it into the accumulator first. 

23. Show the logical design of “MOV [0100],AL” for memory-mapped I/O using AND 
and inverter gates and a 74LS373 latch. Assume that DS = B800H. 

24. Why is memory-mapped I/O decoding more expensive? 


eee eee eee — 
CHAPTER 11: 8255 I/O PROGRAMMING 311 


SECTION 11.3: I/O ADDRESS MAP OF x86 PCs 


25. Show the circuit connection to the PC bus for the following instructions. Use simple 
logic gates 74LS373 and 74LS244. 
(a) OUT 309H,AL (b) IN AL,30CH 

26. Repeat Problem 25 using a 74LS138 for the decoder. 

27. Show the design of an 8255 connection to the PC bus using simple logic gates. 
Assume port address 304H as the base port address for the 8255. 

28. Show the design of an 8255 connection to the PC bus using a 74LS138. Assume base 
address 31CH. 

29. In the IBM PC, how many port addresses are available in the address space common- 
ly referred to as prototype? 

30. Which one is more economical, linear address select or absolute address decoding? 

31. Explain address aliasing. 

32. Which one creates aliases, the linear address select or absolute address decoding? 

33. True or false. To design an IBM PC compatible system, one must follow the I/O map 
of the PC. 

34. In accessing ports in the PC, why must the AEN = 0 signal be used in decoding? 

35. What port address is used for a fixed delay in the x86 PC? : 

36. In x86 PC, the ____ bit of port address | Htogglesevery ___ microseconds. 

37. In Problem 36, to get a 1/4 second delay, we need to load the CX with what value? 

38. In Problem 36, calculate the time delay if CX = 38,000. 


SECTION 11.4: PROGRAMMING AND INTERFACING THE 8255 


39. How many pins of the 8255 are used for ports, and how are they categorized? 

40. What is the function of data pins DO—D7 in the 8255? 

41. What is the advantage of using the 8255 over the 74LS373 and 74LS244? 

42. True or false. All three ports, A, B, and C, can be programmed for simple I/O. 

43. True or false. In simple I/O programming of port A of the 8255, we can use PAQ—PA3 
for output and PA4—PA7 for an input port. 

44. Show the decoding circuitry for the 8255 if we want port A to have address 68H. Use 
NAND and inverter gates. 

45. Which of the following port addresses cannot be assigned to port A of the 8255, and 
why? 

(a) 32H (b) 45H (c) 89H (d) BAH 

46. If 91H is the control word, indicate which port is input and which is output. 

47. Find the control word if PA = input, PB = input, and PCO—PC7 = output. 

48. In the 8255, which mode is used if we want to simply send out data? 

49. Write a program to monitor PA for a temperature of 100. If it is equal, it should be 
saved in register BL. Also, send AAH to port B and 55H to port C. Use the port 
address of your choice. 

50. Write a program in Assembly language to get a byte of the data from PA, convert it to 
ASCII bytes, and store them in registers CL, AH, and AL. For example, an input of 
FFH will show as 255. (Note: FF in binary becomes 323535 in ASCII.) 


ANSWERS TO REVIEW QUESTIONS 


SECTION 11.1: 8088 INPUT/OUTPUT INSTRUCTIONS 


1. 256 input and 256 output ports 
2. 65,536 input and 65,536 output ports 


ee 
312 


3. It sends the contents of 8-bit register AL to port address 24H. 


4. MOV DXx,300H 7LOAD THE PORT ADDR 
IN AL, DX 7GET THE DATA FROM PORT 
MOV DX, 304H 7LOAD THE PORT ADDR 
OUT DX,AL 7SEND OUT THE DATA 
5. IN AL, 60H 7GET DATA FROM PORT ADDRESS 60H 
MOV CH,AL RIGILWAE, MUN CORY WHO) (lst REG 


SECTION 11.2: /O ADDRESS DECODING AND DESIGN 


1. Latch, tri-state buffer 
The CPU provides the data on the data bus only for a short amount of time. Therefore, 
it must be latched before it is lost. 

3. Assuming that OC = 0, the input data is transferred from D to Q when G goes from 
low to high, making it available right away; but it is actually latched when G goes 
from high to low. This reduces the time delay from D to Q. 

4. Low 

>: Blue 

6. MEMW* for output and MEMR* for input devices 


SECTION 11.3: I/O ADDRESS MAP OF x86 PCs 


1. 300-31FH 

2. AEN =0 

3. The I/O devices are programmed by the CPU; therefore, with AEN = 0 it will make 
sure that the I/O device is accessed by the addresses provided by the CPU and not the 
DMA. AEN is active low when the CPU is using the buses. 

4. D4 bit 

5. 25,000 x 15.085 ms = 377.125 ms 


SECTION 11.4: PROGRAMMING AND INTERFACING THE 8255 


F4H, FSH, F6H, and F7H for PA, PB, PC, and control register, respectively 
80H (see Figure 11-12) 

9BH (see Figure 11-12) 

MOV AL, 80H 

OUT CONTREG, AL 

MOV AL,55H 

OUT PORTA,AL 

CUMS PORTER AL 

OUT FORTC AL 

CALL DELAY 

MOV AL, OAAH 

OUT PORTA, AL 

OUT PORTB,AL 

CUT  PORTC AL 

5. All are simple I/O. PA and PB are both out. PC0-PC3 and PC4—PC7 are both in (see 
Figure 11-12). 


wn 


eee eee eee een 
CHAPTER 11: 8255 I/O PROGRAMMING 313 


314 


CHAPTER 12 


INTERFACING TO LCD, MOTOR, 


ADC, AND SENSOR 


OBJECTIVES 


Upon completion of this chapter, you will be able to: 


>> 


>> 


>> 


>> 


>> 


Diagram the interfacing of a PC to an LCD, and code the corresponding 
programs in Assembly and C/C++ 

Diagram the interfacing of a PC to a stepper motor, and code the 
corresponding programs in Assembly and C/C++ 

Diagram the interfacing of a PC to n DAC (digital-to-analog converter) 
device, and code the corresponding programs in Assembly and C/C++ 
Diagram the interfacing of a PC to an ADC (analog-to-digital converter) 
device, and code the corresponding programs in Assembly and C/C++ 
Show the interfacing of ADC devices to sensors 


315 


In this chapter we show PC interfacing to some real-world devices such as an 
LCD, stepper motor, ADC and DAC devices, and sensors. Section 12.1 describes interfac- 
ing and programming of an LCD. In Section 12.2, stepper motor interfacing is described. 
DAC (digital-to-analog converter) interfacing to PC is shown in Section 12.3 and ADC 
(analog-to-digital converter) interfacing to PC is shown in Section 12.4. Sensors, such as 
temperature sensors, and their interfacing are also described in Section 12.4. 


The Assembly language programs in this chapter can only be used with x86 sys- 
tems with the PC104 bus and operating systems that support Assembly language pro- 
gramming. 


We are in the process of designing a USB-I/O Trainer which works with C++, 
C#, and Visual Basic. This trainer will have 24 pins for general-purpose I/O and two 
analog-to-digital converters. See www.MicroDigitalEd.com for more information. 


SECTION 12.1: INTERFACING TO AN LCD 


Table 12-1: Pin Descriptions for LCD 


ee | 
a EA 


This section describes the oper- 
ation modes of LCDs, then describes 
how to program and interface an LCD to 
a PC via an 8255. 


LCD operation 


Description 
Ground 


+5V power supply 


In recent vears the LCD is Power supply 
replacing LEDs (seven-segment LEDs 
or other multisegment LEDs). This is 


due to the following reasons: 


to control contrast 
RS = 0 to select 
command register, 
RS = 1 to select 


1. The declining prices of LCDs. clea uments 


2. The ability to display numbers, 
characters, and graphics. This is in 
contrast to LEDs, which are limited 
to numbers and a few characters. 

3. Incorporation of the refreshing con- 
troller into the LCD itself, thereby 
relieving the CPU of the task of 
refreshing the LCD. In the case of 
the LED, it must be refreshed by the 
CPU (or in some other way) to keep 
displaying the data. 

4. Ease of programming for both char- 
acters and graphics. 


LCD pin descriptions 


R/W = 0 for write, 


R/W = 1 for read 
Enable 


The 8-bit data bus 


The 8-bit data bus 
The 8-bit data bus 
The 8-bit data bus 
The 8-bit data bus 
The 8-bit data bus 
The 8-bit data bus 
The 8-bit data bus 


The LCD discussed in this section has 14 pins. The function of each pin is given 
in Table 12-1. Figure 12-1 shows the pin positions for various LCDs. 

VCC, VSS, and VEE: While VCC and VSS provide +5V and ground, respec- 
tively, VEE is used for controlling the LCD contrast. 

RS, register select: There are two registers inside the LCD and the RS pin is used 
for their selection as follows. If RS = 0, the instruction command code register is select- 
ed, allowing the user to send a command such as clear display, cursor at home, and so on. 
If RS = 1, the data register is selected, allowing the user to send data to be displayed on 
the LCD (or data to be retrieved). 

R/W, read/write: R/W input allows the user to write information into the LCD or 


a 
316 


o UDO00000000000 O 


© 


DMC1610A 14 
DMC1606C 

DMC16117 

DMC16128 

DMC16129 

DMC1616433 

DMC20434 


DMC16106B 21 
DMC16207 
DMC16230 
DMC20215 
DMC32216 


DMC20261 
DMC24227 
DMC24138 
DMC32132 
DMC32239 
DMC40131 
DMC40218 


Figure 12-1. Pin Positions for Various LCDs from Optrex 


read information from it. R/W = 1 
when reading and R/W = 0 when 
writing. 

E, enable: The enable 
pin is used by the LCD to latch 
information presented to its data 
pins. When data is supplied to 
data pins, a high-to-low pulse 
must be applied to this pin in 
order for the LCD to latch in the 
data present at the data pins. This 
pulse must be a minimum of 450 
ns wide. 

D0-D7: The 8-bit data 
pins are used to send information 
to the LCD or read the contents of 
the LCD's internal registers. 


Table 12-2: LCD Command Codes 
Code | Command to LCD Instruction 


Register 

Clear display screen 

Return home 

Decrement cursor (shift cursor to left 
Increment cursor (shift cursor to right 
Shift display right 

Shift display left 

Display off, cursor off 

Display off, cursor on 

Display on, cursor off 

Display on, cursor on 

Display on, cursor blinking 


To display letters and 
numbers, we send ASCII codes 
for the letters A-Z, a-z, and num- 
bers 0-9 to these pins while mak- 
ing RS = 1. 

There are also instruction 
command codes that can be sent 
to the LCD in order to clear the 
display, force the cursor to the 
home position, or blink the cur- 
sor. Table 12-2 lists the instruc- 
tion command codes. 


Sending commands to LCDs 


Shift cursor position to left 

Shift cursor position to right 

Shift the entire display to the left 
Shift the entire display to the right 
Force cursor to beginning of Ist line 
Force cursor to beginning of 2nd line 
2 lines and 5 x 7 matrix 


Note: This table is extracted from Table 12-4. 


To send any of the commands from Table 12-2 to the LCD, make pin RS = 0 and 
send a high-to-low pulse to the E pin to enable the internal latch of the LCD. The con- 
nection of an 8255 to an LCD is shown in Figure 12-2. 


m 
CHAPTER 12: INTERFACING TO LCD, MOTOR, ADC, AND SENSOR 


317 


A2—— 
AJS 
AEN==— 


Decoding 
Circuitry 


Figure 12-2. 8255 Connection to LCD 


Notice the following for the connection in Figure 12-2: 


Saeed aa 


The LCD's data pins are connected to Port A of the 8255. 

The LCD's RS pin is connected to PBO of Port B of the 8255. 
The LCD's R/W pin is connected to PB1 of Port B of the 8255. 
The LCD's E pin is connected to PB2 of Port B of the 8255. 
Both Ports A and B are configured as output ports. 


;The following sends all the necessary commands to the LCD 


MOV 

CALL 
CALL 
CALL 
CALL 
MOV 
CALL 
CALL 
MOV 

CATT 
CALL 
MOV 

CALL 
CALL 


AL, 38H 
COMNDWRT 
DELAY 
DELAY 
DELAY 
AL, OEH 
COMNDWRT 
DELAY 
AL, 01 
COMNDWRT 
DELAY 
AL, 06 
COMNDWRT 
DELAY 


COMNDWRT PROC 
PUSH DX 


MOV 
OUT 
MOV 
MOV 
OUT 
NOP 
NOP 
MOV 
OUT 
BOE 
RET 


DX, PORTA 
DX, AL 
DX, PORTB 


neeaae LCD for 2 lines & 5x7 matrix 
;write the command to LCD 

;wait before issuing the next command 
;this command needs lots of delay 

Send command for LCD on, cursor on 
;write the command to LCD 

;wait before issuing the next command 
¿clear LCD 


wait 
;command for shifting cursor right 


¿wait 


;this procedure writes commands to LCD 
; Save DX 


;send the code to Port A 
Pore B address 


AL, 00000100B ;RS=0,R/W=0,E=1 for H-TO-L pulse 


DX, AL 


ito Port B 
yWait for Nigh ito TOPIE ORLE 
;wide enough 


AL, 00000000B; RS=0,R/W=0,E=0 for H-TO-L pulse 


DX AL 
DX 


COMNDWRT ENDP 


a 


318 


¿restore DX 
rretuürn tO Gallien 


E i In the above program, we must wait before issuing the next command; otherwise, 
it will jam the LCD. A delay of 20 ms should work fine. We can use the port 61H delay 
generation shown in Chapter 11. The code is shown below. 


Sending data to the LCD 


In order to send data to the LCD to be displayed, we must set pin RS = 1, and also 
send a high-to-low pulse to the E pin to enable the internal latch of the LCD. The follow- 
ing code sends characters to the LCD. Again, it places sufficient time delays between each 
data issue to ensure that the LCD is ready for new data. 


MOV AL, 'Y' ;display 'Y' letter 

CALL DATWRIT Rass ae Eo IND) 

CALL DELAY ;wait before issuing the next character 
MOV AL, 'E' ;display 'E' letter 

CALL DATWRIT issue It to LED 

CALL DELAY ;wait before issuing the next character 
MON, — ANE SY ;display 'S' letter 

CALL DATWRIT Fissue at Eo LED 

CALL DELAY ;wait 


;data write to LCD without checking the busy flag 
;AL=char sent to LCD 


DATWRIT PROC 
PUSH DX ;save DX 


MOV DX, PORTA ;DX=port A address 

OUT DxX,AL ;issue the char to LCD 

MOV AL,00000101B 7;RS=1, R/W=0, E=1 for H-to-L pulse 
MOV DX, PORTB Toore B elelolicesis! 

OUT  DX,AL ;make enable high 


MOV AL,00000001B ;RS=1, R/W=0, AND E=0 for H-to-L pulse 
OUT DxX,AL 
POP DX 
RET 
DATWRIT ENDP 


;delay generation using the PB4 bit of port 61H 


DELAY PROC 


MONE (OK, IBZ) ; 1,325x15.085 usec=20 msec 
PUSH AX 
W1: IN AL, 61H 


AND AL, 00010000B 
CMP AL, AH 


JE W1 
MOV AH, AL 
LOOP W1 
POP AX 
RET 

DELAY ENDP 


ee reer er ere eee ee 
CHAPTER 12: INTERFACING TO LCD, MOTOR, ADC, AND SENSOR 319 


Checking LCD busy flag 


The above programs used a time delay before issuing the next data or command. 
This allows the LCD a sufficient amount of time to get ready to accept the next data. 
However, the LCD has a busy flag. We can monitor the busy flag and issue data when it 
is ready. This will speed up the process. To check the busy flag, we must read the com- 
mand register (R/W = 1, RS = 0). The busy flag is the D7 bit of that register. Therefore, 
if R/W = 1, RS = 0. When D7 = 1 (busy flag = 1), the LCD is busy taking care of inter- 
nal operations and will not accept any new information. When D7 = 0, the LCD is ready 
to receive new information. It is recommended by the LCD manufacturer's data sheet to 
monitor the busy flag before sending the data or command codes to the LCD. This ensures 
that the LCD is ready to receive data. See the code below. 


;writing to LCD with checking the busy flag, AL=char 


MOV AL, 38H ¿initialize LED for 2- lincs & 52 
CALL COMNDWRT ;write the command to LCD 

MOV AL- 0EH ¿send command for LCD on,cursor on 
CALL COMNDWRT ;write the command to LCD 

MOV ALCON ;clear LCD 

CALL COMNDWRT 

MOV AL, 06 Aisoumilciigtel tork Shi CEng CUSO EGOE 
CALL COMNDWRT 

MOV Alp Y! ;display 'Y' letter 

CALL DATWRT PASSE) it. to LCD 

MOV AL, 'E' ;display 'E' letter 

CALL DATWRT issue it to LCD 

MOV Ala St ;display 'S' letter 

CALL DATWRT issue it to LCD 


DATWRT PROC 
CALL LCDREADY 
PUSH DX ¿save DX 


MOV DX, PORTA ;DX=port A address 

OUT DxX,AL ;issue the char to LCD 

MOV AL,00000101B ;RS=1, R/W=0, E=1 for H-to-L pulse 
MOV DX, PORTB ;port B address 

OUT DX,AL ;make enable high 

NOP 

NOP 


MOV AL,00000001B ;RS=1, R/W=0, and E=0 for H-to-L 
OUT D Ai 
POP DX 
RET 
DATWRT ENDP 


COMNDWRT PROC 


LCDREADY 

PUSH DX ,save DX 

MOV DX,PORTA 

oOuT ID FAIL ;send the code to Port A 
MOV DX, PORTB 7;Port B address 


MOV AlL,OOO000100B ;RS=0, R/W=deeE=l) tor Hope pulse 


$e 
320 


OUT DX,AL CO ROE B 


NOP ;wait for high-to-low pulse to be 
NOP 7wide enough 

MOV AL,00000000B ;RS=0, R/W=0, E=0 for H-to-L pulse 
OUT DX AL 

EOE DX ¿restore DX 

RET ¿return to caller 


COMNDWRT ENDP 


LCDREADY PROC 


PUSH AX 

PUSH DX 

MOV AL,90H ;PA=input to read LCD status, PB=OUT 
MOV DX, CNTPORT ;DX=control port address 

OUT DX,AL ;issue to control] reg 


MOV AL,00000110B ;RS=0 busy flag is a 
;command R/W=1, E=1 (L-to-H for E) 


MOV DX, PORTB ;port B address 
OUT DX,AL aise uem GECO port: B 
MOV DX, PORTA ;port A address 
AGAIN:IN AL, DX ;read command reg busy flag is D7 
ROL AL,1 ;send busy flag to carry flag 
JC AGAIN ¿if CF=1 LCD not ready try again 
MOV AL, 80H ;make PA=OUT to send character 
MOV DX, CONTPORT ;DX=control port address 
OUT IDX AL ;issue to 8255's control reg 
POP DX 
POP AX 
RET 


LCDREADY ENDP 


LCD cursor position 


In the LCD, one can put data at any location. For the 20 x 2 LCD, the address for 
the first location of line 1 is 80H, and for line 2 it is COH. The following shows address 
locations and how they are accessed: 

RS R/W DB7 DB6 DBS DB4 DB3 DB2 DBI DBO 

0 0 1 A A A A A A A 


where AAAAAAA = 0000000 to 0100111 for line 1 and AAAAAAA = 1000000 to 
1100111 for line 2. See Table 12-3. The upper address range can go as high as 0100111 
for the 40-character-wide LCD while for the 20-character-wide LCD it goes up to 010011 
(19 decimal = 10011 binary). Notice that the upper range 0100111 (binary) = 39 decimal, 
which corresponds to locations 0 to 39 for the LCDs of 40 x 2 size. From the above dis- 
cussion we can get the addresses of cursor positions for various sizes of LCDs. See Figure 
12-3. Note that all the addresses are in hex. 


As an example of setting the cursor at the fourth location of line 1 we have the 
following: 


MOV AL, 83H ;LINE 1 POSITION 4 
CALL COMNDWRT 


ee ee ee eee SK 


CHAPTER 12: INTERFACING TO LCD, MOTOR, ADC, AND SENSOR 321 


and for the sixth location of the second line we have: 


MOV AL, OC5H 
CALL COMNDWRT 


Notice that since the location addresses are in hex, 0 is the first location. 


LCD programming in Visual C/C++ 


In Chapter 11 we showed how to program the x86 PC I/O port using C/C++ for 
MS Visual C/C++. Example 12-1 shows LCD programming using Visual C/C++. 


LCD timing and data sheet 


Figures 12-4 and 12-5 show timing diagrams for LCD write and read timing, 
respectively. Notice that the write operation happens on the H-to-L pulse of the E pin 
while the read is activated on the L-to-H pulse of the E pin. Table 12-4 provides a more 
detailed list of LCD instructions. 


Table 12-3: LCD Addressing 


me e 
ee 


80 SIL 82 83 84 85 86 through ¢F 
CO El C2 C3 C4 G5 C6 thtough CE 
80 81 82 83 enrougno 
80 81 82 83 through 93 
CO ev Ge es through bs 
80 81 82 83 through 93 
CO C CZ ES EhLoOugh ps 
94 05 96 97 through A7 
D4 DS D6 D7 througihage7. 
40 x 2 LCD 80 8l 82 35 through A7 
CO ei CZ e5 three h 7 


— 


Note: All data is in hex. 
Figure 12-3. Cursor Addresses for Some LCDs 


322 


Write a Visual C/C++ program to display “Hello” on line | starting at the sixth position. 


Solution: 


#include<conio.h> 
#include<stdio.h> 
#include<iostream.h> 
#include<iomanip.h> 
#include<windows.h> 
//tested by Dan Bent 
void main() 

{ 

unsigned int i; 

char message[ 5] ="Hello"; 

cout<<setiosflags (ios::unitbuf) ; 


Oe pi0s 03, 0x80) ; // control word for Pome A, B C 
Pout p(Oes00;, 0x38) ; Mamae LCEDEROr 2 linés & Sx? matrix 
_outp (0x301, 0x04) ; //RS=0,R/W=0,E=1 for H-to-L pulse 
Poume0x 3501, 0x00) ; //RS=0,R/W=0,E=0 for H-to-L pulse 
_Sileep (500) ; /(@elay. S00e@mi ivseconds 

_outp (0x300, 0x0E) ; //send command for LCD on, cursor on 
_outp (0x301, 0x04) ; //RS=0,R/W=0,E=1 for H-to-L pulse 
Pouce (0301, 0x00) ; //RS=0,R/W=0,E=0 for H-to-L pulse 
sleep (250); //delay 250 milliseconds 

_outp (0x300,0x01); //clear LCD 

_outp (0x301,0x04) ; //RS=0,R/W=0,E=1 for H-to-L pulse 
Zoutp (0x301,0x00):; //RS=0,R/W=0,E=0 for H-to-L pulse 
Bs leepyz 50); 

_outp (0x300, 0x06) ; //SHLLe eurs or right 

_outp (0x301, 0x04) ; //RS=0,R/W=0,E=1 for H-to-L pulse 
Dower (0x301,0x00) ; //RS=0,R/W=0,E=0 for H-to-L pulse 
Tsteeoi2auy; 


outpdOx3s00, 0x85) ; //move cursor to beginning of line 
~outp (0x301, 0x04); //RS=0,R/W=0,E=1 for H-to-L pulse 
_outp (0x301, 0x00) ; //RS=0,R/W=0,E=0 for H-to-L pulse 
DENeeE 250) ; 


//write data to LCD 
for (1=0;i<strlen (message); it+) 
{ 
Ou (02300, (intjmessage| il); 
~outp (0x301, 0x05); //RS=1,R/W=0,E=1 for H-to-L pulse 
OumpOxsSUiy, 0x01); //RS=1,R/W=0,E=0 for H-to-L pulse 
sleep (250); //delay 250 milliseconds 


} 


a 
CHAPTER 12: INTERFACING TO LCD, MOTOR, ADC, AND SENSOR 323 


Review Questions 


1. The RS pin is an (input, output) pin for the LCD. 
. The E pin is an (input, output) pin for the LCD. 
3. The E pin requires an (H-to-L, L-to-H) pulse to latch in information at the 
data pins of the LCD. 
4. For the LCD to recognize information at the data pins as data, RS must be set to 
(high, low). 


5. Give the command codes for line 1, first character, and line 2, first character. 


tpwy = Enable pulse width = 450 ns (minimum) 

tpsw = Data setup time = 195 ns (minimum) 

ty = Data hold time = 10 ns (minimum) : 

tas = Setup time prior to E (going high) for both RS and R/W = 140 ns (minimum) 
tay = Hold time after E has come down for both RS and R/W = 10 ns (minimum) 


Figure 12-4. LCD Write Timing 


tp = Data output delay time 
tas = Setup time prior to E (going high) for both RS and R/W = 140 ns (minimum) 
tan = Hold time after E has come down for both RS and R/W = 10 ns (minimum) 


Note: Read requires an L-to-H pulse for the E pin. 


Figure 12-5. LCD Read Timing 


eee 
324 


Execution 
Time 


crCenrananne 
gt RRRARRRS 
SBSeraanaaangaa 


Table 12-4: List of LCD Instructions 
Clears entire display and sets DD 
RAM address 0 in address counter 
Return Home |o 0 0 0 0 0 0 01 - | Sets DD RAM address 0 as address 
counter. Also returns display being 
shifted to original position. DD RAM 
contents remain unchanged. 
Entry Mode 000 00 0 011/Ds | Sets cursor move direction and specifies 
Set shift of display. These operations are 
performed during data write and read. 
0000001DCB | Sets On/Off of entire display (D), 
cursor On/Off (C), and blink of cursor 
position character (B). 
Display Shift out changing DD RAM contents. 
Function Set |0 0 0 0 1 DLN F - -| Sets interface data length (DL), num- 
ber of display lines (L), and character 
ON 
Address is sent and received after this setting. 
Set DD RAM Sets DD RAM address. DD RAM a | 
A ddre is sent and received after this setting 
Read Busy 0 1 BF AC Reads Busy flag (BF) indicating inter- 
Flag & Address nal operation is being performed and 
reads address counter contents. 
Write Data Writes data into DD or CG RAM. 
DD RAM 
Notes: 
1 Execution times are maximum times when fcp or fosc is 250 kHz. 
as Execution time changes when frequency changes. Example: When fcp or fosc is 270 kHz: 40 us x 250 / 
B pie ae 
DD RAM Display data RAM 
CG RAM Character generator RAM 


ACC CG RAM address 

ADD DD RAM address, corresponds to cursor address 

AC Address counter used for both DD and CG RAM addresses. 
1/D = Increment 1/D=0 Decrement 
= Accompanies display shift 

C= Display shift; S/C=0 Cursor move 
R/L= Shift to the right; R/L=0 Shift to the left 
DL= 8 bits, DL = 0: 4 bits 

N= 1 line, N = 0: 1 line 

F= 5 x 10 dots, F = 0: 5 x 7 dots 


Internal operation; BF = 0 Can accept instruction 


CHAPTER 12: INTERFACING TO LCD, MOTOR, ADC, AND SENSOR 325 


SECTION 12.2: INTERFACING TO A STEPPER MOTOR 


This section begins with an overview of the basic operation of stepper motors. 
Then we describe how to interface a stepper motor to the PC. Finally, we use Assembly 
language programs to demonstrate control of the angle and direction of stepper motor 
rotation. 


See www.MicroDigitalEd.com for the 


USB-I/O version of this section. 


Stepper motors 


A stepper motor is a widely used device that translates electrical pulses into 
mechanical movement. In appli- 
cations such as disk drives, dot 
matrix printers, and robotics, the 
stepper motor is used for posi- 
tion control. Every stepper 
motor has a permanent magnet 
rotor (also called the shaft) sur- 
rounded by a stator (see Figure 
12-6). The most common stepper 
motors have four stator windings 
that are paired with a center- 
tapped common as shown in 
Figure 12-7. This type of stepper 
motor is commonly referred to 
as a four-phase stepper motor. 
The center tap allows the change 
of current direction in each of 
two coils when a winding is 
grounded, which results in a 
polarity change of the stator. 
Notice that while a conventional 
motor shaft runs freely, the step- 
per motor shaft moves in a fixed 
repeatable increment, which 
allows one to move it to a pre- 
cise position. This repeatable 
fixed movement is possible as a Average 
result of the basic magnet theory North 
where poles of the same polarity 
repel and opposite poles attract. 
The direction of the rotation is 
dictated by the stator poles. The 
stator poles are determined by 
the current sent through the wire Average 
coils. As the direction of current South 
is changed, the polarity is also 
changed causing the reverse 
motion of the rotor. The stepper 
motor discussed here has a total 
of six leads: four leads represent- 
ing the four stator windings and 
two commons for the center Figure 12-6. Rotor Alignment 
tapped leads. As the sequence of 


326 


power is applied to each stator winding, the 
rotor will rotate. There are several widely 
used sequences where each has different 


degree of precision. Table 12-5 shows the | A 

normal 4-step sequence. B COM 
It must be noted that although we | c 

can start with any of the sequences in Table | p COM 


12-5, once we start we must continue in the 

proper order. For example, if we start with 

step 3 (0110) we must continue in the 

sequence of steps 4, 1, 2, and so on. 
Figure 12-7. Stator Windings 
Configuration 


Table 12-5: Normal 4-Step Sequence 


Clockwise] Step inding D | Counter- 
i ad eo clockwise 


Step angle 


How much movement is asso- 
ciated with a single step? This 
depends on the internal construction 
of the motor, in particular the number 
of teeth on the stator and the rotor. The 
step angle is the minimum degree of 
rotation associated with a single step. 
Various motors have different step 
angles. Table 12-6 shows some step angles for various motors. In Table 12-6, notice the 
term steps per revolution. This is the total number of steps needed to rotate one complete 
rotation or 360 degrees (e.g., 180 steps x 2 degrees = 360). 

It must be noted that perhaps contrary to one's initial impression, a stepper motor 
does not need to have more terminal leads for the stator to achieve smaller steps. All the 
stepper motors discussed in this section have four leads for the stator winding and two 
com wires for the center tap. Although some manufacturers have set aside only one lead 
for the common signal instead of two, they always have four leads for the stators. 

With this background on stepper motors, next we see how we can interface them 
with the PC. 


Stepper motor connection and programming 


Example 12-2 shows the programming of the stepper motor as connected in 
Figure 12-8. Study this example very carefully since it contains some very important 
points on motor interfacing. 


p 
CHAPTER 12: INTERFACING TO LCD, MOTOR, ADC, AND SENSOR 327 


Example 12-2 


Describe the 8255 connection to the stepper motor of Figure 12-8 and code a program to rotate 
it continuously. 


Solution: 


The following steps show the 8255 connection to the stepper motor and its programming. 


Use an ohmmeter to measure the resistance of the leads. This should identify which COM 
leads are connected to which winding leads. 

The common wire(s) are connected to the positive side of the motor's power supply. In many 
motors, +5 V is sufficient. 

The four leads of the stator winding are controlled by the four bits of port A (PAO—PA3). 
However, since the 8255 lacks sufficient current to drive the stepper motor windihgs, we 
must use a driver such as the ULN2003 to energize the stator. Instead of the ULN2003, we 
could use transistors as drivers. However, notice that if transistors are used as drivers, we 
must also use diodes to take care of inductive current generated when the coil is turned off. 
One reason that the ULN2003 is preferable to the use of transistors as drivers is that the 
ULN2003 has an internal diode to take care of back EMF. 


MOV AL, 80H 
MOV DX, CNTRLPORT LOAD CONTROL PORT ADDRESS 
OUT DX, AL ¿PORT AS OUTPUT 
MOV BIP OSH 
MOV AH, 01 
INT 16H ;CHECK KEY PRESS 
JNZ ASTIE *;EXIT UPON KEY PRESS 
MOV AL, BL 
MOV DX, PORTA 
OUT DX, AL 
MOV CX, 20000 
LOOP HERE ; DELAY 
ROR BESI 
JMP AGAIN 
ERUN 


In the above program we are sending the sequence 33H, 66H, CCH, and 99H to the 
stepper motor continuously. The motor keeps moving unless a key is pressed. 

By changing the value of DELAY, we can change the speed of rotation. In your program 
use a fixed time delay. The fixed time delay generation was shown in Chapter 11. 


Steps per second and RPM relation 


The relationship between the RPM (revolutions per minute), steps per revolution, 
and steps per second is intuitive and is as follows. 


RPM x Steps per revolution 


Steps per second = 
ps p 60 


328 


ULN2003 


Stepper Motor 


ULN2003 Connection 
for Stepper Motor 
Pin 8 = GND 


Pin 9 =+5V Use a separate power supply 


for the motor. 


Figure 12-8. 8255 Connection to Stepper Motor 
The four-step sequence and number of teeth on rotor 


The switching sequence shown above in Table 12-5 is called the 4-step switching 
sequence since after four steps the same two windings will be “ON”. How much move- 
ment is associated with these four steps? After completing every four steps, the rotor 
moves only one tooth pitch. Therefore, in a stepper motor with 200 steps per revolution, 
its rotor has 50 teeth, since 4 x 50 = 200 steps are needed to complete one revolution. This 
leads to the conclusion that the minimum step angle is always a function of the number of 
teeth on the rotor. In other words, the smaller the step angle, the more teeth the rotor pass- 
es. See Example 12-3. 


Example 12-3 


Give the number of times the 4-step sequence in Table 12-5 must be applied to a 
stepper motor to make an 80-degree move if the motor has a 2-degree step angle. 


Solution: 


A motor with a 2-degree step angle has the following characteristics: 

Step angle: 2 degrees Steps per revolution: 180 

Number of rotor teeth: 45 Movement per 4-step sequence: 8 degrees 

To move the rotor 80 degrees, we need to send 10 four-step sequences consecutively, since 10 
x 4 steps x 2 degrees = 80 degrees. 


Looking at Example 12-3, one might wonder what happens if we want to move 
45 degrees since the steps are 2 degrees each. To allow for finer resolutions, all stepper 
motors allow what is called an 8-step switching sequence. The 8-step sequence is also 
called half-stepping since in following the 8-step sequence each step is half of the normal 
step angle. For example, a motor with a 2-degree step angle can be used as a 1-degree step 
angle if the sequence of Table 12-7 is applied. 

Example 12-4 shows the C version of the program to turn the stepper motor clock- 
wise. 


eee reer TT 
CHAPTER 12: INTERFACING TO LCD, MOTOR, ADC, AND SENSOR 329 


Table 12-7: Half-Step 8-Step Sequence 


Clockwise] Step #] Winding A | Winding B | Winding C Counter- 
a CS e e en 


Write a Turbo C++ program to turn the stepper motor clockwise continuously. Pressing any key 
should exit the program. 


Solution: 


//Turning the stepper motor clockwise continuously 
#include <conio.h> 
#include <stdio.h> 
main () 
{ 
_outp (0x303, 0x80) ; //CONFIGURE 8255 AS OUT 
printf ("\n Turning the Stepper motor clockwise. Press any key to 
6xit this ‘program\n") ; 
do 
{ 
Sue pO s00), 0299); 
weLeeps00),; //500 msec 
_outp (0x300, 0xec) ; 
_sieep (500) ; //500 msec 
_outp (0x3007Uxes) ; 
_Sleep (500); //500 msec 
~Sutp(Ox3007E xs 
_sleep (500) ; //500 msec 
} 
while (!kbhit()); //PRESS ANY KEY TO STOP 
return (0); 


} 


Motor speed 


The motor speed, measured in steps per second (steps/s), is a function of the 
switching rate. Notice in Example 12-2 that by changing the length of the time delay loop, 
we can achieve various rotation speeds. 


Holding torque 


The following is the definition of the holding torque: “With the motor shaft at 
standstill or zero RPM condition, the amount of torque, from an external source, required 
to break away the shaft from its holding position. This is measured with rated voltage and 


eee 
330 


current applied to the motor.” The unit is ounce-inch (or kg-cm). 


Wave drive 4-step sequence 


In addition to the 8-step sequence and the 4-step sequence discussed earlier, there 
is another sequence called the wave drive 4-step sequence. It is shown in Table 12-8. 
Notice that the sequence of Table 12-8 is simply the combination of the wave drive 4-step 
and normal 4-step normal sequences shown in Tables 12-5 and 12-7, respectively. 
Experimenting with the wave drive 4-step is left to the reader. 


Table 12-8: Wave Drive —_ ic 


Clockwise |Step Minding A | Winding E [ Winding C | Winding D. Counter- 


clockwise 


Notice that if a given motor requires more current than the ULN2003 can provide, 
we can use transistors, as shown in Figure 12-9. 


+V Motor 
LZ \ 


2N2222 1N4001 


Use TIP 110 part for QI-Q4 
if motor needs several amps. 


Figure 12-9. Using Transistors for Stepper Motor Driver 


CHAPTER 12: INTERFACING TO LCD, MOTOR, ADC, AND SENSOR 331 


Review Questions 


1. Give the 4-step sequence of a stepper motor if we start with 0110. 
2. A stepper motor with a step-angle of 5 degrees has steps per revolution. 
3. Why do we put a driver between the 8255 and the stepper motor? 


SECTION 12.3: INTERFACING TO A DAC 


This section will show how to interface a DAC (digital-to-analog converter) to a 
PC via the 8255. Then we demonstrate how to generate a sine wave on the scope using 
the DAC. 


Digital-to-analog (DAC) converter 


The digital-to-analog converter (DAC) is a device widely used to convert digital 
pulses to analog signals. In this section we discuss the basics of interfacing a DAC to a 
PC. 

Recall from your digital electronics book the two methods of making a DAC: 
binary weighted and R/2R ladder. The vast majority of integrated circuit DACs, including 
the MC1408 used in this section, use the R/2R method since it can achieve a muth high- 
er degree of precision. The first criterion for judging a DAC is its resolution, which is a 
function of the number of binary inputs. The common ones are 8, 10, and 12 bits. The 
number of data bit inputs decides the resolution of the DAC since the number of analog 
output levels is equal to 2”, where n is the number of data bit inputs. Therefore, the 8-input 
DAC such as the MC1408 provides 256 discrete voltage (or current) levels of output. 
Similarly, the 12-bit DAC provides 4096 discrete voltage levels. Although there are 16-bit 
DACs, they are expensive. 


DAC 808 


In the DAC808, the digital inputs are converted to current (Ipu). By connecting a 
resistor to the I,,, pin, we convert the result to voltage. The total current provided by the 
[out is a function of the binary numbers at the DO—D7 inputs of the 1408 and the reference 
current (Ler), and is as follows: 


wee: D7 D6 D5 D4 D3 D2 DI DO 
out ~ tref — + — + — + — + — + — + — + — 
2 4 8 16 32 64 128 256 


where D0 is the LSB, D7 is the MSB for the inputs, and Ieis the input current that must 
be applied to pin 14. The I, current is generally set to 2.0 mA. Figure 12-10 shows the 
generation of current reference (setting I.t = 2 mA) by using the standard 5-V power sup- 
ply and 1K, 1.5K ohm standard resistors. Some also use the zener diode (LM336), which 
overcomes any fluctuation associated with the power supply voltage. Now assuming that 
Lef = 2 mA, if all the inputs to the DAC are high, the maximum output current is 1.99 mA 
(verify this for yourself). 


Converting |,,, to voltage in 1408 DAC 


We connect the output pin I,,, to a resistor, convert this current to voltage, and 
monitor the output on the scope. However, in real life this can cause inaccuracy since the 
input resistance of the load where it is connected will also affect the output voltage. For 
this reason, the I, current output is isolated by connecting it to an op amp such as the 741 
with R;= 5 kilohms for the feedback resistor. Assuming that R = 5 kilohms, by changing 
the binary input, the output voltage changes as shown in Example 12-5. 


Generating a sine wave 


To generate a sine wave, we first need a table whose values represent the magni- 
tude of the sine of angles between 0 and 360 degrees. The values for the sine function vary 


aeaee 
332 


Assuming that R = 5K and Lef = 2 mA, calculate V out for the following binary inputs: 
(a) 10011001 binary (99H) (b) 11001000 (C8H) 


Solution: 


(a) Tout = 2 mA (153/255) = 1.195 mA and Vout = 1.195 mA x 5K = 5.975 V 
(b) Iout = 2 mA (200/256) = 1.562 mA and Vout = 1.562 mA x 5K = 7.8125 V 


from —1.0 to +1.0 for 0 to 360 degree angles. Therefore, the table values are integer num- 
bers representing the voltage magnitude for the sine of theta. This method ensures that 
only integer numbers are output to the DAC by the x86 processor. Table 12-9 shows the 
angles, the sine values, the voltage magnitude, and the integer values representing the 
voltage magnitude for each angle with 30-degree increments. To generate Table 12-9, we 
assumed the full-scale voltage of 10V for the DAC output. Full-scale output of the DAC 
is achieved when all the data inputs of the DAC are high. Therefore, to achieve the full- 
scale 10V output, we use the following equation. 


Vat = DV + (5x sing) 


out 


To find the value sent to the DAC for various angles, we simply multiply the V,,, 
voltage by 25.60 because there are 256 steps and full-scale V ut is 10 volts. Therefore, 256 
steps / 10 V = 25 6 steps per volt. To further clarify this, look at Example 12-6. 


Example 12-6 
Verify the values of Table 12-9 for the following angles: (a) 30 (b) 60. 


Solution: 


Oe OVX sme) — SV t5xsms0=5V+5 x O5e75V 


DAC input values = 7.5 V x 25.6 = 192 (decimal) 
(Geo V OV x sin 0) =5 V+ 5x sim60=5V+5x 086 =9.33 V 


DAC input values = 9.33 V x 25.6 = 238 (decimal) 


The following program sends the values of Table 12-9 to the DAC. See Figure 12-11. 
nedara Somme oe 
Piehewe leo, O22 30, 255,2538,192, 028, 64,17,0,17,64, 128 
in code Segment 
;PA is assumed to be output 


Al: MOV (xe, UZ Count 
MOV BX,OFFSET TABLE 
MOV DX, PORTA Pex: A address 


NEXT: MOV AL,[ BX] 
CUR DX, AL 


INC BX 

CALL ~ DELAY ,Lee DAC recover 
LOOP NEXT 

JMP A1 ;do it again 


To produce a simple stair-step sine wave, we can use Example 12-7. 
Example 12-8 uses the Turbo C++ math functions to generate the look-up table values. 
You can use Visual C++ instead. 


EE rere reer 
CHAPTER 12: INTERFACING TO LCD, MOTOR, ADC, AND SENSOR 333 


Table 12-9: Angle v. Voltage Magnitude for Sine Wave 


Angle 8 Vout (Voltage Magnitude) Values Sent to DAC (decimal) 


(degrees) 5V+(5 Vx sin 0) (Voltage Mag. x 25.6) 


128 
192 
238 
255 
238 
192 
128 
` 64 
i 
0 
y 
64 
128 


Vref(+) 


Vref(-) 1.5k 


To scope 
Vout = 0-10V 


R—siror 


Range 
control 


Figure 12-10. 8255 Connection to DAC808 


334 


In order to generate a stair-step ramp, set up the circuit in Figure 12-10 and connect the output 
to an oscilloscope. Then write a program to send data to the DAC to generate a stair-step ramp. 


Solution: 


MOV AL,80H 
MOV DX,303H 
OUT DX, AL 
MOV DX,300H 
MOV An ON 
INT 16H ;CHECK KEY PRESS 
INZ STOP ;EXIT UPON KEY PRESS 
SUB Am AL 
BACK: OUT DX,AL 
INC IAL 
CMP AL,O 
OZ AGAIN 
CALL DELAY ;LET DAC RECOVER 
JMP BACK 
;EXIT 


Degrees 


OoOrNv WPMD NA CO OO 


30 60 90 120 150 180 210 240 270 300 330 


Figure 12-11. Angle v. Voltage Magnitude for Sine Wave 


Review Questions 


1. In a DAC, input is (digital, analog) and output is (digital, analog). 
Answer for ADC input and output as well. 

2. DAC808 is a(n) __-bit D-to-A converter. 

3. The output of DAC808 is in (current, voltage). 


e 


CHAPTER 12: INTERFACING TO LCD, MOTOR, ADC, AND SENSOR 335 


Example 12-8 


Write a Turbo C++ program to generate a sine wave on PA. Use the C++ math functions to gen- 
erate the look-up table values. Pressing any key should exit the program. 


Solution: 


//GENERATING SINE WAVE VIA A DAC CONNECTED TO PORT A 
#include <conio.h> 
#include <stdio.h> 
#include <math.h> 
main () 
{ 
euro (0x303, 0x30); //CONFIGURE 8255 AS OUT 
unsigned char vl; //vl IS A BYTE SIZE DATA 
float Vout,magnitude; 
int a; 
printf ("\n Press any key to exit this program\n"); 
do 
{ 
for (a=0;a<360;a++) //FOR THE FULL 360 DEGREES 
{ 
VowE=5. 0+ (50 = Moen ( (Se Ws); //VOLTAGE MAGNITUDE 
magnitude=Vout * 25.6; //VALUE SENT TO DAC 
vl=(char) magnitude; //MAKE IT A BYTE SIZE 
delay(1); 
outp (0x300,Vv1) ; //OUTPUT DP TO PORT A 
} 
} 
while (!kbhit()); //PRESS ANY KEY TO EXIT 
return (0); 


} 


SECTION 12.4: INTERFACING TO ADC CHIPS AND SEN- 
SORS 


This section will explore interfacing ADC (analog-to-digital converter) chips and 
temperature sensors to a PC. After describing the ADC chips, we show how to interface 
them to the PC using the PC Interface Trainer. Then we examine the characteristics of the 
LM3/35 temperature sensor and show how to interface it with proper signal conditioning. 


ADC devices 


Analog-to-digital converters are among the most widely used devices for data 
acquisition. Digital computers use binary (discrete) values, but in the physical world 
everything is analog (continuous). Temperature, pressure (wind or liquid), humidity, and 
velocity are a few examples of physical quantities that we deal with every day. A physi- 
cal quantity is converted to electrical (voltage, current) signals using a device called a 
transducer. Transducers are also referred to as sensors. There are sensors for temperature, 
velocity, pressure, light, and many other natural quantities, and they produce an output 
that is voltage (or current). Therefore, we need an analog-to-digital converter to translate 
the analog signals to digital numbers so that the PC can read them. Next we describe an 
ADC chip. 


——$—$—$—$—$—<— 
336 


ADC0848 chip 


The ADC0848 IC is an analog-to-digital 
converter in the family of the ADC0800 series 1 


from National Semiconductor Corp. Data sheets bi a 
for this chip can be found at their web site, 2 CS 23.0) 
www.national.com Wr 
com: 3 o 
The ADC0848 has a resolution of 8 bits. “a 
It is an 8-channel ADC, thereby allowing it to 4 INTR 21 |0 
monitor up to 8 different analog inputs. See 5 
: DBO0/MAO 20 |A 
Figures 12-12 and 12-13. The ADC0844 chip in 
the same family has 4 channels. The following is 6 DB1/MA1 19 |0 
the discussion of the pins of the ADC0848. 7 DB2/MA2 1810 
CS: Chip select is an active-low input 
used to activate the 848 chip. To access the 848, 8 DB3/MA3 17 | 
this pin must be low. 
a 5 : ; 9 DB4/MA4 16 | 
RD (read): This is an input signal and is 
active low. ADC converts the analog input to its 10 AGND DBS 15 |0 
binary equivalent and holds it in an internal reg- I Vrai DB6 14/0 
ister. RD is used to get the converted data out of 
the 848 chip. When CS = 0, if the RD pin is 12 DGND DB7 13 |0 


asserted low, the 8-bit digital output shows up at 
the D0-D7 data pins. The RD pin is also referred 
to as output enable (OE). Figure 12-12. ADC0848 Chip 
Vref is an input voltage used for the ref- 
erence voltage. The voltage connected to this pin dictates the step size. For the ADC0848, 
the step size is V,,./256 since it is an 8- ; 
bit ADC and 2 to the power of 8 gives Table 12-10: ADC0848 V. vs. Step Size 
us 256 steps. See Table 12-10. For [y (V) 
example, if the analog input range needs = 


Step size (mV) 


to be 0 to 4 volts, V e is connected to4 |= 19.53 
volts. That gives 4 V/256 = 15.62 mV 15162 
for step size. In another case, if we 10 
need the step size of 10 mV then Vet = 5 


2.56 V, since 2.56 V/256 = 10 mV. 
DB0-DB7 are the digital data 

output pins. With a DO—D7 output, the ; aa 2 

a aa ADC eaga YOO MEP Size Merido. 

size, which is the smallest change, is dictated by the number of digital outputs and the Vef 

voltage. To calculate the output voltage, we use the following formula: 


25 


y. 


In 


out step size 


where D,,, = digital data output (in decimal), V, = analog input voltage, and step size (res- 
olution) is the smallest change, which is V,.¢/256 for an 8-bit ADC. See Example 12-9 for 
clarification. Notice that DO—D7 are tri-state buffered and that the converted data is 
accessed only when CS = 0 and a low pulse is applied to the RD pin. Also, notice the dual 
role of pins D0-D7. They are also used to send in the channel address. This is discussed 
next. 

MA0-—MA4 (multiplexed address). The ADC0848 uses multiplexed address/data 
pins to select the channel. Notice in Figure 12-13 that a portion of the DBO-DB7 pins are 
also designated as MAO—-MA4. The D0-D7 pins are inputs when the channel’s address is 
sent in. However, when the converted data is being read, DO—D7 are outputs. While the 
use of multiplexed address/data saves some pins, it makes the I/O interfacing more diffi- 
cult as we will soon see. 


Eee errr reer eee TT = 
CHAPTER 12: INTERFACING TO LCD, MOTOR, ADC, AND SENSOR 337 


D0/MAO 
D1/MA1 
D2/MA2 


ADC0848 D3/MA3 
D4/MA4 


D7 


Figure 12-13. ADC0848 Block Diagram 


WR (write; a better name might be “start conversion”). This is an input into the 
ADC0848 chip and plays two important roles: (1) It latches the address of the selected 
channel present on the D0-D7 pins, and (2) it informs the ADC0848 to start the conver- 
sion of analog input at that channel. If CS = 0 when WR makes a low-to-high transition, 
the ADC0848 latches in the address of the selected channel and starts converting the ana- 
log input value to an 8-bit digital number. The amount of time it takes to convert is a max- 
imum of 40 microseconds for ADC0848. The conversion time is set by an internal clock. 

CH1-CH8 are 8 channels of the V; analog inputs. In what is called single-ended 
mode, each of the 8 channels can be used for analog V;,, where the AGND (analog ground) 
pin is used as a ground reference for all the channels. These 8 channels of input allow us 
to read 8 different analog signals, but not all at the same time since there is only a single 
D0-D7 output. We select the input channel by using the MAO—-MA4 multiplexed address 
pins according to Table 12-11. In Table 12-11, notice that MA4 = low and MA3 = high 
for single-ended mode. The ADC0848 can also be used in differential mode. In diffzren- 
tial mode, two channels, such as CH1 and CH2, are paired together for the V, (+) and 
Vin(—) differential analog inputs. In that case V,, = CH1(+) — CH2(-) is the differential 
analog input. To use ADC0848 in differential mode, MA4 = don’t care, and MA3 is set 
to low. For more on this, see the ADC0848 data sheet on the www.national.com web site. 

VCC is the +5 volt power supply. 

AGND, DGND (analog ground and digital ground). Both are input pins provid- 
ing the ground for both the analog signal and the digital signal. Analog ground is connect- 
ed to the ground of the analog Vi, while digital ground is connected to the ground of the 
VCC pin. The reason that we have two ground pins is to isolate the analog V,, signal from 
transient voltages caused by digital switching of the output DO—D7. Such isolation con- 
tributes to the accuracy of the digital data output. Notice that in the single-ended mode the 
voltage at the channel is the analog input and AGND is the reference for the V;,- In our 
discussion, both the AGND and DGND are connected to the same ground; however, in the 
real world of data acquisition, the analog and digital grounds are handled separately. 

INTR (interrupt; a better name might be “end of conversion”). This is an output 
pin and is active low. It is a normally high pin and when the conversion is finished, it goes 
low to signal the CPU that the converted data is ready to be picked up. After INTR goes 
low, we make CS = 0 and apply a low pulse to the RD pin to get the binary data out of the 
ADC0848 chip. See Figure 12-14. 


Selecting an input channel 


The following are the steps for data conversion by the ADC0848 chip. 
1. While CS = 0 and RD = 1, provide the address of the selected channel (see 
Table 12-11) to the DBO—DB7 pins and apply a low-to-high pulse to the WR pin to 
latch in the address and start the conversion. The channel’s addresses are 08H for 


ee 
338 


Example 12-9 


For a given ADC0848, we have V= 2.56 V. Calculate the DO—D7 output if the analog 
input is: (a)1.7 V, and (b) 2.1 V. 
Solution: 


Since the step size is 2.56/256 = 10 mV, we have the following. 
(a)Dout = 1.7V/10 mV = 170 in decimal, which gives us 10101011 in binary for D7—DO. 


(b)Dout = 2.1 V/10 mV = 210 in decimal, which gives us 11010010 in binary for D7-D0. 


CH1, 09H for CH2, OAH for CH3, and so on, as shown in Table 12-11. Notice that 
this process not only selects the channel, but also starts the conversion of the analog 
input at the selected channel. 

2. While WR = 1 and RD = 1, keep monitoring the INTR pin. When INTR goes low, the 
conversion is finished and we can go to the next step. If INTR is high, keep polling 
until it goes low, signalling end-of-conversion. 

3. After the INTR has become low, we must make CS = 0, WR = 1, and apply a low 
pulse to the RD pin to get the data out of the 848 IC chip. 


Table 12-11: ADC0848 Analog Channel Selection (Single-Ended Mode) 


Selected Analog Channel MAO 


CH] LO | ili oO | oO 
CH2 -< j 
CH3 ca ml mel [Ol 9 
CH4 TAA kaa 
CH5 ie E el Hl) 
CH6 Lee Ts oe 
CH7 eae ee PS 


Note: Channel is selected when CS = 0, RD = 1, and an L-to-H pulse is applied to WR. 


ADC0848 connection to 8255 


The following is a summary of the connection between the 8255 and the 
ADC0848 as shown in Figure 12-15. 


PAO—PA7 to D0-D7 of ADC: Channel selection (out), data read (in) 


PBO to INTR Port B as input 
PCO to WR Port C as output 
PC1 to RD Port C as output 


Notice the following facts about the above connection: 


1. Port A is an output when we select a channel, and it is an input when we read the con- 


verted data. 
2. We must monitor the INTR pin of the ADC for end-of-conversion; therefore, we con- 


figure PB as input. Since both WR and RD are inputs into ADC, Port C is configured 
as an output port. 


reer reer ere errr DT _______: _—__ 
CHAPTER 12: INTERFACING TO LCD, MOTOR, ADC, AND SENSOR 339 


“— a in 
m 
D0-D7 Chan ‘Address Data out 


Latch address End conversion 


—— o e 


Figure 12-15. 8255 Connection to ADC0848 for CH2 


The following program is for Figure 12-15. It selects channel 2. After reading its 
data, the data is converted from binary (hex) to ASCII. In the program, CL = least signif- 
icant digit (LSD) and AL = most significant digit (MSD). 


MOV AL,82H ;PA=OUT, PB=IN, PC=OUT 

MOV DX,CNT PORT 

Our DAAT 

MOV AL,09 ;CHANNEL 2 ADDRESS (Table 12-11) 
MOV DX,PORT A 

OUT DAL 

MOV AL, 02 ; WR=0, RD=1 

MOV DX,PORT C 

OUT ~ DAL 


CALL DELAY ra few microseconds delay 

MON AL,03 ;WR=1, RD=1 

OUT DX, AL 7TO LATCH CHANNEL ADDRESS 

CALL DELAY ;few usec 

MOV AL, 92H 7PA=IN, PB=IN, AND PC=OUT 


MOV DX, CNT PORT 


—————————————— 
340 


OUT DX, AL 
MOV DX, PORT _B 7GET READY TO MONITOR INTR 


Bik: IN AL, DX ; MONITOR INTR 
AND AL, 01 PAS healt BITS EXCHET INTR 
CMP AL, 01 PALS) J030” IeliKel si 
JNE Bl [KEER MONITORING FOR LOW 
MOV AL, 01 ;WR=1, RD=0 TO READ DATA 


MOV DX, PORT C 

OUT DAT 

MOV DX, PORT A TEORIE & . TOGET DATA 

IN AL, DX *;GET THE CONVERTED DATA 
Converting OQO-FFH hex value to decimal and then to ASCII. 
; AL,AH,CL will have decimal values in ASCII 

MOV BL, 10 


SUB AH, AH CLEAR AH FOR WORD/BYTE DIV 
DIV BL ;AX/BL 

MOV CL, AH ,SAVE LSD IN CL REG 

SUB AH, AH a 

DIV BL ;AX/BL FOR 2ND DIGIT 


;make them all ASCII 
OR AX, 3030H 
OR COH 


Notice the conversion of the above data to ASCII. In order to display ADC input 
on a screen or LCD, it must be converted to ASCII. However, to convert it to ASCII, it 
must be converted to decimal first. To convert a 00-FF hex value to decimal we keep 
dividing it by 10 until the remainder is less than 10. Each time we divide it by 10 we keep 
the quotient as one of our decimal digits. In the case of an 8-bit data, dividing it by 10 
twice will do the job. For example, if we have FFH it will become 255 in decimal. To con- 
vert from decimal to ASCII format, we OR each digit with 30H. Now all we have to do is 
to send the digits to the PC screen by using INT 21H or send them to the LCD as was 
shown in the first section of this chapter. One advantage of using C/C++ programs is that 
such a conversion is done by the compiler. 


Interfacing a temperature sensor to PC 


Transducers are used to convert physical quantities such as temperature, light 
intensity, flow, and speed to electrical signals. Depending on the transducer, the output 
produced is in the form of voltage, current, resistance, or capacitance. For example, tem- 
perature is converted to electrical signals using a transducer called a thermistor. The ther- 
mistor responds to temperature change by changing its resistance. However, its response 
is not linear, as shown in Table 12-12. 

The complexity associated with 
writing software for such nonlinear devices Temperature 
has led many manufacturers to market the 
linear temperature sensor. Simple and wide- 
ly used linear temperature sensors include 
the LM34 and LM35 series from National 
Semiconductor Corp. They are discussed 
next. 


LM34 and LM35 temperature sensors 


Table 12-12: Tiento Resistance vs. 


Temperature (C 


The sensors of the LM34 series are From William Kleitz, Digital Electronics 
precision integrated-circuit temperature sen- 
sors whose output voltage is linearly proportional to the Fahrenheit temperature. The 
LM34 requires no external calibration since it is inherently calibrated. It outputs 10 mV 
for each degree of Fahrenheit temperature. Table 12-13 is the selection guide for the 
LM34. The LM35 series sensors are also precision integrated-circuit temperature sensors 
whose output voltage is linearly proportional to the Celsius (centigrade) temperature. The 


p 


CHAPTER 12: INTERFACING TO LCD, MOTOR, ADC, AND SENSOR 341 


LM35 requires no external calibration since it is inherently calibrated. It outputs 10 mV 
for each degree of centigrade temperature. Table 12-14 is the selection guide for the 
LM35. 


Table 12-13: LM34 Temperature Sensor Series Selection Guide 


Temperature Range 


Note: Temperature range is in degrees Fahrenheit. 


Output 


10 mV/F 
10 mV/F 
10 mV/F 
10 mV/F 


LM34CA 


Table 12-14: LM35 Temperature Sensor Series Selection Guide 


Output Scale 

Wave 
10 mV/F 
10 mV/F 
10 mV/F 


0C to +100 C 10 mV/F 


Note: Temperature range is in degrees Celsius. 


LM35CA 


Signal conditioning and interfacing the LM35 to a PC 


Signal conditioning is a widely used term in the world of data acquisition. The 
most common transducers produce an output in the form of voltage, current, charge, 
capacitance, and resistance. However, we need to convert these signals to voltage in order 
to send input to an A-to-D converter. This conversion (modification) is commonly called 
signal conditioning. Signal conditioning can be a current-to-voltage conversion or a sig- 
nal amplification. For example, the thermistor changes resistance with temperature. The 
change of resistance must be translated into voltages in 
order to be of any use to an ADC. Look at the case of con- Analog world 
necting an LM35 to an ADC0848. Since the ADC0848 has (temperature, 
8-bit resolution with a maximum of 256 steps and the pressure, etc.) 
LM35 (or LM34) produces 10 mV for every degree of tem- 
perature change, we can condition V;,, of ADC0848 to pro- 


duce a Vow of 2560 mV (2.56 V) for full-scale output. Ter 
Therefore, in order to produce the full-scale V,,, of 2.56 V | Transducer | 
for the ADC0848, we need to set V e = 2.56. This makes 


Vout Of the ADC0848 correspond directly to the temperature 


as monitored by the LM35. This is shown in Table 12-15. Signal 
Figure 12-17 shows connection of the temperature conditioning 

sensor to CH2 of the ADC0848. Notice that we use the 

LM336-2.5 zener diode to fix the voltage across the 10k i 


POT at 2.5 volts. The use of the LM336-2.5 should over- 
come any fluctuations in the power supply. The LM336 has 
three leads. However, the third lead is unconnected. 


ADC808/809 


Another popular ADC is the ADC808/809 chip. It 
has eight input channels allowing it to convert 8 different 


Figure 12-16. Getting 
Data to the CPU 


342 


Table 12-15: Temperature v. Vout of the ADC0848 


0000 0000 
0000 0001 
0000 0010 
0000 0011 
0000 1010 
0001 1110 


Figure 12-17. 8255 Connection to ADC0848 and Temperature Sensor 


analog inputs. See Figure 12-18. It is an 8-bit ADC. The following is the pin description 
of the ADC808/809 chip. 

OE (output enable): This is an input signal and is active high. ADC converts the 
analog input to its binary equivalent and holds it in an internal register. OE is used to get 
the converted data out of the ADC808 chip. If a low-to-high pulse is applied to the OE 
pin, the 8-bit digital output shows up at the D0-D7 data pins. The OE pin is also referred 
to as RD (read). 

SC (start conversion): This is an input pin and is used to inform the ADC808 to 
start the conversion process. If we apply a low-to-high pulse to this pin, the ADC808 starts 
converting the analog input value of V, to an 8-bit digital number. The amount of time it 
takes to convert varies depending on the CLK value. When the data conversion is com- 
plete, the EOC (end of conversion) pin is forced low by the ADC808. 

CLK is an input pin and is connected to an external clock source. The CLK speed 
dictates the conversion time. While the ADC0848 uses an internal clock, for the ADC808 
the clock source is external. This way one can control the conversion speed. 

EOC (end of conversion): This is an output pin and is active low. It is a normal- 
ly high pin and when the conversion is finished, it goes low to signal the CPU that the con- 
verted data is ready to be picked up. After EOC goes low, we send a low-to-high pulse to 
the OE pin to get the data out of the ADC808 chip. 

V.-(+) and V,.¢—) are both input voltages used for the reference voltage. The 
voltage connected to these pins dictates the step size. For the ADC808/809, the step size 
is [V,ef{+) — Vre(—)] /256, since it is an 8-bit ADC and 2 to the power of 8 gives us 256 
steps. For example, if the analog input range needs to be 0 to 4 volts, V,.+) is connect- 
ed to 4 volts and V,,«—) is grounded. That gives 4 V/256 = 15.62 mV for the step size. In 


CHAPTER 12: INTERFACING TO LCD, MOTOR, ADC, AND SENSOR 343 


INO — > GND Clock Vcc 


ADC808/809 


EOC 
OE 
SC ALE C BA 


Figure 12-18. ADC808/809 


another case, if we need the step size of 10 mV, Ve = 2.56 V since 2.56 V/256 = 10 mV. 
Notice that if we connect the Ve(—) input to a voltage other than ground, the step size is 
calculated based on the differential value of the V e+) — V e~) inputs. 

D0-D7 are the digital data output pins. These are tri-state buffered and the con- 
verted data is accessed only when OE is forced high. While the analog input voltage is in 
the range 0 to +5V, the output D0-D7 is given in binary. 

IN0-IN7 are the 8 channels of the V; analog inputs. These 8 channels of input 
allow us to read 8 different analog signals. However, they cannot all be read at the same 
time since there is only a single DO—D7. We select the input channel by using the A, B, 
and C address selector pins according to Table 12-16. 

A, B, C, and ALE. The input channel is selected by using the A, B, C, and ALE 
pins. These are input signals into the ADC808/809 and the channel is selected according 
to Table 12-16. To select a channel, we provide the channel address to the A, B, and C pins 
according to Table 12-16 and then apply an L-to-H pulse to the ALE pin to latch in the 
address. 


Table 12-16: ADC808/809 Analog Channel Selection 


Selected Analog Channel i 9. | ae A 
INO re 6 FD 
INI a Oa ae 
IN2 L o a ao 
IN3 A To 
IN4 cS a ee 
INS Ll ore 
IN6 0 
IN? | 


Note: Channel is selected when OE = 0, and an L-to-H pulse is applied to ALE. 


How to read ADC808/809 data 


Comparing the ADC808/809 with the ADC0848 shows that the ADC808/809 has 
a clock pin. This means that we must provide an external clock source. Therefore, the con- 
version speed varies according to the speed of the external clock source. Also, notice that 
the ADC808/809 has no CS pin. 


344 


U Ne 


The following are the steps to select a channel and read its data. 

Provide the channel address (see Table 12-16) to pins A, B, and C. 
Apply an L-to-H pulse to the ALE pin to latch in the channel address. 
Apply an L-to-H pulse to the SC pin to start the conversion of analog input to digital 

ata. 
After the passage of 8 clocks, the EOC pin will go low to indicate that the data is con- 
verted and ready to be picked up. We can either use a small time delay and then read 
the data out, or monitor the EOC pin and read the data out after it goes low. Notice 
that if you use a time delay to wait before you read the data, the size of the delay varies 
depending on the speed of the clock connected to the clock pin of the ADC808/809. 
Apply an L-to-H pulse to the OE pin and read the data. 


Review Questions 


a O 


In the ADC0848, the INTR signal is an (input, output). 

To begin conversion, send a(n ) pulse to the f 

Which pin of the ADC0848 indicates end-of-conversion? 

The LM35 provides mV for each degree of (Fahrenheit, Celsius) temper- 
ature. 

Both the ADC0848 and ADC808 are -bit converters. 


PROBLEMS 


SECTION 12.1: INTERFACING TO AN LCD 


Oe ee ae 


ee 


The LCD discussed in this section has (4, 8) data pins. 

Describe the function of pins E, R/W, and RS in the LCD. 

What is the difference between the VCC and VEE pins on the LCD? 

Clear LCD is a (command code, data item) and its value is __ hex. 
What is the hex value of the command code for display on, cursor on? 

Give the state of RS, E, and R/W when sending a command code to the LCD. 

Give the state of RS, E, and R/W when sending data character 'Z' to the LCD. 
Which of the following is needed on the E pin in order for a command code (or data) 
to be latched in by the LCD? 

(a) H-to-L pulse (b) L-to-H pulse 

True or false. For the above to work, the value of the command code (data) must be 
already at the D0-D7 pins. 


. There are two methods of sending streams of characters to the LCD: (1) checking the 


busy flag, or (2) putting some time delay between each character without checking the 
busy flag. Explain the difference and advantage and disadvantage of each method. 
Also explain how we monitor the busy flag. 


. For a 16 x 2 LCD the location of the last character of line 1 is 8FH (its command 


code). Show how this value came about. 


. For a 16 x 2 LCD the location of the first character of line 2 is COH (its command 


code). Show how this value came about. 


. For a 20 x 2 LCD the location of the last character of line 2 is 93H (its command 


code). Show how this value came about. 


. For a 20 x 2 LCD the location of the third character of line 2 is C2H (its command 


code). Show how this value came about. 


. For a 40 x 2 LCD the location of the last character of line 1 is A7H (its command 


code). Show how this value came about. 


. For a 40 x 2 LCD the location of the last character of line 2 is E7H (its command 


code). Show how this value came about. 
Show the value (in hex) for the command code for the 10th location, line 1 on a 20 x 
2 LCD. Show how you got your value. 


CHAPTER 12: INTERFACING TO LCD, MOTOR, ADC, AND SENSOR 345 


18. 


J9; 


20. 


Show the value (in hex) for the 
command code for the 20th loca- 
tion, line 2 on a 40 x 2 LCD. Show 
how you got your value. 

Rewrite the COMNDWRT proce- 
dure (shown in Section 12.1) if 
port C is used for control signals. 
Assume that PC4 = RS, PC5 = 
R/W, PC6 = E. 

Repeat the above program for a 
data write procedure. Send the 
string “Hello” to the LCD without 
checking the busy flag. 


LCD Connection for Problem 19 


SECTION 12.2: INTERFACING TO A STEPPER MOTOR 


28. 


29) 
30. 


. If a motor takes 90 steps to make one complete revolution, what is the step angle for 


this motor? 


. Calculate the number of steps per revolution for a step angle of 7.5 degrees. ` 
. Finish the normal 4-step sequence clockwise if the first step is 0011 (binary). 
. Finish the normal 4-step sequence clockwise if the first step is 1100 (binary). 
. Finish the normal 4-step sequence counterclockwise if the first step is 1001 (binary). 
. Finish the normal 4-step sequence counterclockwise if the first step is 0110 (binary). 
. What is the purpose of the ULN2003 placed between the 8255 and the stepper motor? 


Can we use that for 3A motors? 

Which of the foilowing cannot be a sequence in the normal 4-step sequence for a step- 
per motor? 

(a) CCH (b) DDH (c) 99H (d) 33H 

What is the effect of a time delay between issuing each step? 

In Question 29, how can we make a stepper motor go faster? 


SECTION 12.3: INTERFACING TO A DAC 


. To get a smaller step, we need a DAC with 
. To get full-scale output, what should be the inputs for DAC? 


. True or false. DAC1408 is the same as DAC0808. Are they pin compatible? 
. Find the number of discrete voltages provided by the n-bit DAC for the following. 


(ajn=8 (b)n2=10 (c)n=12 


. For DAC1408, if Ie = 2 mA show how to get the I,,, of 1.99 when all inputs are high. 
. Find I,,, for the following inputs. Assume Ief = 2 mA for DAC1408. 


(a) 10011001 (b) 11001100 (c) 11101110 
(d) 00100010 (e) 00001001 (f) 10001000 
(more, fewer) digital inputs. 


SECTION 12.4: INTERFACING TO ADC CHIPS AND SENSORS 


42. 


. Give the status of CS and WR needed to start conversion for the ADC0848. 
. Give the status of CS and WR needed to get data from the ADC0848. 
. In the ADC0848 what happens to the converted analog data? How do we know that 


the ADC is ready to provide us the data? 


. In the ADC0848 what happens to the old data if we start conversion again before we 


pick up the last data? 


. In the ADC0848 INTR is an (input, output) signal. What is its function in 


the ADC0848? 
For an ADC0848 chip, find the step size for each of the following Vp 
(a) Viep PVE (B) Viep=1V_ (C) Vier = 1.9V 


ee 
346 


43. In the ADC0848 what should be the Vef Value if we want a step size of 20 mV? 
44. In the ADC0848 what should be the Vf Value if we want a step size of 5 mV? 
45. In the ADC0848 how is the analog channel selected? 
46. With a step size of 19.53 mV, what is the analog input voltage if all outputs are 1? 
47. With V,.¢= 1.28V, find the V, for the following outputs. 

(a) D7-D0 = 11111111 (b) D7-D0 = 10011001 (c) D7-D0 = 1101100 
48. What does it mean when it is said that a given sensor has a linear output? 
49. The LM34 sensor produces mV for each degree of temperature. 
50. What is signal conditioning? 


ANSWERS TO REVIEW QUESTIONS 
SECTION 12.1: INTERFACING TO AN LCD 


1. Input 2. Input 3. H-to-L 
4. High 5. 80H and COH 


SECTION 12.2: INTERFACING TO A STEPPER MOTOR 


1. 0110,0011,1001,1100 for clockwise, and 01 10, 1100, 1001, 0011 for counterclockwise 
Ba 2 


3. Because the 8255 does not provide sufficient current to drive the stepper motor 
SECTION 12.3: INTERFACING TO A DAC 


l. Digital, analog. In ADC the input is analog, the output is digital. 
25 
3. Current 


SECTION 12.4: INTERFACING TO ADC CHIPS AND SENSORS 


Output 

L-to-H, WR pin 
INTR 

10, both 

8 


Sita tO: 


CHAPTER 12: INTERFACING TO LCD, MOTOR, ADC, AND SENSOR 347 


348 


CHAPTER 13 


8253/54 TIMER 


OBJECTIVES 


Upon completion of this chapter, you will be able to: 


>> 


>> 


>> 
>> 


Describe the function of each pin of the 8253/54 PIT (programmable 
interval timer) 

Program the three counters of the 8253/54 by use of the chip’s control 
word 

Diagram how the 8253/54 timer is connected in the IBM PC 

Write programs to play music notes on the x86 PC speaker 


349 


In the PC there is a single clock used to synchronize activities of all peripheral 
chips connected to the CPU. That clock, which has the highest frequency in the system, 
belongs to the x86 CPU. There are functions within the PC that require a clock with a 
lower frequency. The 8253/54 PIT (programmable interval timer) is used to bring down 
the frequency to the desired level for various uses such as the beep sound in the PC. This 
chapter focuses on 8253/54 timer chip and how it is used in the x86 PC. In Section 13.1 
we will describe the 8253/54 timer and show the processes of initializing and program- 
ming it. The interfacing and the use of the 8253/54 in the x86 PC is discussed in Section 
13.2. Section 13.3 will show how the 8253/54 can be used to generate various frequen- 
cies, including musical notes on the x86 PC. 


SECTION 13.1: 8253/54 TIMER 


The 8253 chip was used in the IBM PC/XT, but starting with the IBM PC AT, the 
8254 replaced the 8253. The 8254 and 8253 have exactly the same pinout. The 8254 is a 
superset of the 8253, meaning that all programs written for the 8253 will run on the 8254: 


The following are pin descriptions of the 8253/54. 
A0, A1, CS 
Inside the 8253/54 timer, Table 13-1: Addressing 8253/54 
there are three counters. Each 
works independently and is pro-|| œS 8} Ad aote 
grammed separately to divide the 
input frequency by a number from o oO Counter ` 
1 to 65,536. Each counter is > AE Gil Counter 1 
assigned an individual port address. os) alee Counter 2 
The control register common to all : 
three counters has its own port ; Counter register 
address. This means that a total of 4 8253/54 is not selected 
ports are needed for a single 
8253/54 timer. The ports are addressed by A0, A1, and CS, as shown in Table 13-1. Each 
of the three counters has three pins associated with it, CLK (clock), GATE, and OUT, as 
shown in Figure 13-1. See Example 13-1. 

CLK 

CLK is the input clock frequency, which can range between 0 and 2 MHz for the 
8253. For input frequencies higher than 2 MHz, the 8254 must be used; the 8254 can go 
as high as 8 MHz, and the 8254-2 can go as high as 10 MHz. 

OUT 

Although the input frequency is a square wave of 33% duty cycle, the shape of 
the output frequency coming from the OUT pin after being divided can be programmed, 
Among the options are square wave, one shot, and other square shape waves of various 
duty cycles but no sine wave or saw tooth shapes. 

GATE 

This pin is used to enable or disable the counter. Putting HIGH (5 V) on GATE 
enables the counter, whereas LOW (0 V) disables it. In some modes a 0 to 1 pulse must 
be applied to GATE to enable the counter. 

D0-D7 

The D0-D7 data bus of the 8253/54 is a bidirectional bus connected to D0-D7 of 
the system data bus. The data bus allows the CPU to access various registers inside the 
8253/54 for both read and write operations. RD and WR (both active low) are connected 
to IOR and IOW control signals of the system bus. 


A o 
350 


Initialization of the 8253/54 


Each of the three counters of 
the 8253/54 must be programmed 
separately. In order to program any 
of the three counters, the control byte 
must first be written into the control 
register, which among other things 
tells the 8253/54 what shape of out- 
put pulse is needed. In addition, the 
number that the input clock should be 
divided by must be written into that 
counter of the 8253/54. Since this 
number can be as high as FFFF 
(16-bit data) and the data bus for the 
8253/54 timer is only 8 bits wide, the 
divisor must be sent in one byte at a 
time. The 8253/54 must be initialized 
before it is used. 


Oo WON OOH AON = 


= 
(æ) 


= à 
N = 
= 
p> 


Block diagram 
Microprocessor 
interface 


< 7200 
D7 - DO 
f | 


Control word 


Figure 13-2 shows the 
one-byte control word of the 
8253/54. This byte, which is sent to 
the control register, has the following 
bits. 


DO chooses between a binary 
number divisor of 0000 to FFFFH or 
a BCD divisor of 0000 to 9999H. The Figure 13-1. 8253 Pin and Function Diagram 
lowest number that the input frequency 
can be divided by for both options is 0001. The highest number is 216 for binary and 104 
for BCD. To get the highest count (65,536 decimal and 10000 BCD), the counter is loaded 
with zeros. 

D1, D2, and D3 are for mode selection. There are six possible modes that deter- 
mine the shape of the output signal. 


Mode 0 Interrupt on terminal count 
Mode 1 Programmable one shot 
Mode 2 Rate generator 

Mode 3 Square wave rate generator 
Mode 4 Software triggered strobe 
Mode 5 Hardware triggered strobe 


D4 and D5 are for RLO and RL1. The data bus of the 8253/54 is 8 bits (1 byte), 
but the number that the input frequency can be divided by (divisor) can be as high as 
FFFFH. Therefore, RLO and RL] are used to indicate the size of the divisor. RLO and RL! 
have three options: (1) read/write the most significant byte (MSB) only, (2) read/write the 
least significant byte (LSB) only, (3) read/write the LSB first followed immediately by the 
MSB. 

The options for RLO and RL1 show that programmers can not only write the value 
of the divisor into the 8253/54 timer but read the contents of the counter at any given time, 
as well. Since all counters are down counters, and the count register is decremented, the 
count register's contents can be read at any time, thus using the 8253/53 as an event count- 
er. 

D6 and D7 are used to select which of the three counters, counter 0, counter 1, or 
counter 2, is to be initialized by the control byte. 


CHAPTER 13: 8253/54 TIMER 351 


Binary counter (16-bit) 
BCD (4 decades) 


o po Counter Tatching operafon OOO 
fo [r | Readioad ts only | 
f+ fo [Read WSB only — 
Read/load LSB first, then MSB 


[0 [Select counter 0 
Select counter 1 
Select counter 2 
Le A] 


Figure 13-2. 8253/54 Control Word Format 


Example 13-1 


Pin CS of a given 8253/54 is activated by binary address A7~A2 = 100101. 
(a) Find the port addresses assigned to this 8253/54. 
(b) Find the configuration for this 8253/54 if the control register is programmed as follows. 


MOV Ad, 00110110 
OUT 97H, AL 


Solution: 


(a) From Table 13-1, we have the following: 
CS AlA0 Port Port address (hex) 
1001 01 0 0 Counter 0 94 
1001 01 0 1 Counter 1 95 
1001 01 1 0 Counter 2 96 
1001 01 1 1 Control register oF 


Breaking down the control word 00110110 and comparing it with Table 13-1 indicates 
counter 0 since the SC bits are 00. The RL bits of 11 indicate that the low-byte 
read/write is followed by the high byte. The mode selection is mode 3 (square wave), 
and finally binary counting is selected since the DO bit is 0. 


To program a given counter of the 8253/4 to divide the CLK input frequency, one 
must send the divisor to that specific counter's register. In other words, although all three 
counters share the same control register, the divisor registers are separate for each count- 
Se 


Regarding the options bit DO of the control byte, it must be noted that in BCD 


ÁT 


352 


Using the port addresses in Example 13-1, show the programming of counter 1 to divide CLK1 
by 10,000, producing the mode 3 Square wave. Use the BCD option in the control byte. 


Solution: 


ATTE Counter 1, mode 3, BED 

97H, AL meena it to control register 

AL,AL pal = 0 load the divisor for 10,000 
95H; AL ¡send the low byte 

Si Site. IG; rand then the high byte to counter 1 


Use the port addresses in Example 13-1 to: 

(a) program counter 0 for binary count of mode 3 (square wave) to divide CLKO by number 
4282 (BCD), 

(b) program counter 2 for binary count of mode 3 (square wave) to divide CLK2 by number 
C26A hex, 


(c) find the frequency of OUTO and OUT? in (a) and (b) if CLKO = 1.2 MHz, CLK2 = 1.8 MHz. 


Solution: 


(a) To program counter 0 for mode 3, we have 00110111 for the control word. Therefore, 
MOV AL, 37H ;counter 0, mode 3, BCD 
OUT 97H, AL ;send it to control register 
MOV AX, 4282H ;load the divisor (BCD needs H for hex) 
OUT 94H, AL ;send the low byte 
MOV AL, AH JCO Counter 
OUT 94H, AL ;and then the high byte to counter 0 


(b) By the same token: 
MOV AL, B6H ;counter 2, mode 3, binary (hex) 
OUT 97H, AL peend 2a tO control register 
MOV AX, C26AH ;load the divisor 
OUT 96H, AL ;send the low byte 
MOV AL, AH COT COUNESr 2 
OUT 96H, AL ;send the high byte to counter 2 


(c) The output frequency for OUTO is 1.2 MHz divided by 4282, which is 280 Hz. Notice 
that the program in part (a) used instruction "MOV AX,4282H" since BCD and hex 
numbers are represented in the same way, up to 9999. For OUT2, CLK2 of 1.8 MHz is 
divided by 49770 since C26AH = 49770 in decimal. Therefore, OUT2 frequency is a 
square wave of 36 Hz. 


mode, if we program the counter for 9999, the input frequency is divided by that number. 
However, to divide the frequency by 10,000 we must send in 0 for both high and low 
bytes. See Examples 13-2 and 13-3. 

We can program any of the counters for divisors of up to 65,536 if we use the 
binary option for DO. To program the counter for the divisor of 65,536, the counter must 
be loaded with 0 for the low byte and another 0 for the high byte of the divisor. In that 
case, DO = 0 for the control byte. 


CHAPTER 13: 8253/54 TIMER 353 


Review Questions 


1. True or false. Any code written for the 8253 will work on the 8254. 

2. The 8253/54 can be used to (divide, multiply) a square wave digital 
frequency. 

3. IfCS of the 8253/54 is activated by A7—A2 = 0110 00 binary, find the port address 
for this timer. 

4. Find the control byte to program counter 2 for mode 1 (programmable one shot), 
BCD count, low byte, followed by high byte R/W. 

5. True or false. To divide input frequency CLK1 by 5065, we must send the 5065 to 
the control register. 

6. For Question 5, give the port address using the ports in Question 3. 

7. To divide the CLK frequency by 52,900, which option for DO of the control byte 
must be selected, and why? 

8. If DO=0 in the control byte, what is the highest number for the divisor? 


SECTION 13.2: x86 PC 8253/54 TIMER CONNECTION AND 
PROGRAMMING 


The first IBM PC used a 74LS138 to decode addresses for CS of the 8253 as 
shown in Figure 13-3. The port addresses are selected as indicated in Table 13-2, assum- 
ing zeros for x's. Chapter 11 contains a complete discussion of port selection. 

The three clocks of the 8253, CLK0, CLK1, and CLK2, are all connected to a 
constant frequency of 1.1931817 MHz. This frequency is from PCLK of the 8284 chip 
after it has been divided by 2 with the use of D flip-flop 72LS175, as shown in 
Figure 13-4. PCLK of the 8284 (discussed in Chapter 9) is 2.3863633 MHz and must be 
divided by 2 since the maximum allowed input frequency of CLK of the 8253 is 2 MHz. 


Table 13-2: 8253/54 Port Address Calculation in the x86 PC 


E ee ee 
Hex Address 
40 Counter 0 
1 41 

42 

43 


Counter 1 
Counter 2 


4 Counter register 


X X 


Figure 13-3. 8253 Port Selection in the x86 PC 


354 


GATEO0 and GATE], which enable counter 0 and counter 1, respectively, are connected to 
HIGH (5 V), thereby making those two counters enabled permanently. GATE2 of count- 
er 2 can be enabled or disabled through PBO of port B of the 8255. Now that the input fre- 
quency to each timer is known, programming and applications of each counter in the PC 
can be explained. 


8253/54 


18.2 Hz to IRQO0 
of 8259 


T4LS74 
DRE 


of 8237 


to speaker 


Speaker _PBO driving circuitry 


Enable 
(Divide by 2 P one 
74LS175 9 74LS38 
Port 61H open collector 


speaker data 


2.383 MHz 
PCLK of 8254 


Figure 13-4. 8253/54 Chip Connection in the x86 PC 


Using counter 0 


CLKO of counter 0 is 1.193 MHz, and GATE0 is connected to high permanently. 
OUTO of counter 0 is connected to IRQO (the highest priority interrupt) of the 8259 inter- 
rupt controller. The next question is: How often is IRQO activated, or in other words, what 
is the output frequency? IRQO is activated 18.2 times per second, or put another way, the 
OUTO frequency is 18.2 Hz. If the frequency of CLKO is 1.193 MHz and the output fre- 
quency should be 18.2 Hz, the counter must be programmed to divide 1.193 MHz by 
65,536. The wave shape is a square wave (mode 3 of the 8253) in order to trigger IRO on 
the positive edge of each pulse of the square wave so that a high pulse will not be mistak- 
en for a multiple interrupt. Using the above information and Figure 13-2, the control word 
can be calculated in the following way: 

DO = 0 for the binary (or hex) value of the counter divisor. The timer is decre- 
mented after every input pulse until it reaches zero and then the original value is loaded 
again. Therefore, to divide the input frequency by 65,536, the timer is programmed with 
Os for both high and low bytes. 

D3 D2 D1 = 011, mode 3, for the square wave output of 18.2 Hz frequency. 

D4 D5 = 11 for reading/writing the LSB first, followed by the MSB. 

D7 D6 = 00 for counter 0. 


Summarizing the above gives the following control word: 


D7 D6 D5 D4 D3 D2 D1 DO 
Oo @ id 2 0 1 it 0 = 936H 
The programming of counter 0 is as follows: 
MOV AL, 36H AEOMiako Word 
OUT 43H, AL MeO COME rol pegi Ste Ols C253 
MOV AL, 00 700 LSB and MSB of the divisor 
OUT 40H, AL ;LSB to timer U 
(OWL 40H, AL ;MSB to timer 0 


CHAPTER 13: 8253/54 TIMER 355 


The IBM PC BIOS shows the same process as follows: 
22 TIMER EQU 40H 


E27/ B036 695 MOV AL, 36H ;SET TIM 0, CSB, MSBPNODER 
E279 E643: 696 OUT TIMER+3,AL ;WRITE TIMER MODE REG 
ATE BOO G27 MOV AL, 0 

E27D E640 698 OUT TIMER, AL WRITE LSB TO TIMER O REG 


E284 E640 704 O TIMER, AL ;WRITE MSB TO TIMER 0 REG 


At the rate of 18.2 Hz (or every 54.94 ms), BIOS will make this interrupt avail- 
able by going to the vector table of INT 1CH. The user can define CS:IP of a service rou- 
tine at the vector location belonging to INT 1CH and use it for any purpose, as will be 
seen in Chapter 14. If the user is not using this interrupt, control will automatically be 
returned to BIOS. 


Using counter 1 


In counter 1, CLK1 is connected to 1.193 MHz and GATE is high permanently. 
OUTI generates a periodic pulse required to refresh DRAM memory of the computer. 
This refreshing must be done at least every 15 us for each cell. As will be discussed in 
Chapter 15, in the IBM PC the task of refreshing DRAM is performed by the 8237 DMA. 
It is up to the 8253's counter | to inform DMA periodically, lest the allowed time pass. To 
achieve this, OUT1 will provide DMA a pulse of approximately 15 us duration or 66,278 
Hz. This means that counter 1 must divide the input frequency 1.19318 MHz by 18 
(1.19318 MHz divided by 18 = 66,278 Hz). Using Figure 13-2, the control byte can be 
figured out as follows: 

DO = 0 for binary option 

D3 D2 D1 = 010 for mode 2 shape output. In this mode, OUT1 stays high for a 
total of 18 pulses and goes low for one pulse. This action is repeated continuously. 

D5 D4 = 01 for the LSB only, since the byte is less than FF. CLK1 is divided by 
18; therefore, 18 is the LSB and there is no need for the MSB. 

D7 D6 = 01 for counter 1 


DI... DO 
0101 0100 = 54H for the control word 


The programming of the 8253 counter 1 in the IBM BIOS is listed as follows, 
with slight modifications for the sake of clarity: 


MOV AL, 54H 7 the control word 

OUT 43H,AL ; to control register 
MOV AL, 18 718 decimal, the divisor 
OVT 41H,AL 7 co) COUMEC tm 


Using counter 2 


The output of counter 2 is connected to two different devices: the speaker and 
PCS of the 8255. In early models of the IBM PC/XT, it was also connected to the cassette 
circuitry. That option has been eliminated in all the IBM PC and PS/2 computers built in 
recent years. Since counter 2 in the IBM PC is used to play music, it is important to under- 
stand counter 2 programming thoroughly. 


Use of timer 2 by the speaker 


In the IBM PC, CLK2 is connected to a frequency of 1.19318 MHz, and GATE2 
is programmed by PBO of port 61H (port B). The IBM PC uses counter 2 to generate the 
beep sound. Although BIOS uses timer 2 for the beep sound, it can be changed to play any 


ee 
356 


musical note, as will be shown in the next section. The beep sound has a frequency of 896 
Hz of mode 3 (square wave). Dividing the input frequency of 1.19318 MHz by 896 Hz 
gives 1331 (0533 hex) for the value to be loaded to counter 2. This gives the following 
control word: 


DT a DO 


1011 0110 = B6H for binary option, mode 3 (square wave), LSB first, then 
MSB, counter 2. The program would be as follows: 


MOV AL, OB6H Control vord 
OUT A3H AL 

MOV AL, 33H jlow byte 

GUN 425, AL 

MOV AL,05 ;high byte 


CUT 42H,AL 


or it can be written as follows: 
TIMER EQU 40H 


SUC OTL D A a A O a a T « 


MOV AL, BOANO OB Ao ERE M2 ESB MSB, BINARY 
OUT TIMER+3,AL ¡WRITE THE TIMER MODE REG 
MOV AX, 53348 7 DiRSOR FOR 1000s EAzN(3896 Hz} 
OUT TIMER+2,AL ;WRITE TIMER 2 CNT LSB 

MOV AL, AH 

QUE TIMER+2, AL ;WRITE TIMER 2 CNT MSB 


Turning on the speaker via PBO and PB1 of port 61H 


The process of turning on the speaker is the same for all the x86 PCs regardless 
of the microprocessor used. As can be seen from Figure 13-4, GATE2 must be high to 
provide the CLK to timer 2. This function is performed by PBO of port 61H. Again from 
Figure 13-4, OUT2 of timer 2 is ANDed with PB1 of port 61H, then is input to the driv- 
ing circuitry of the speaker. Therefore, to allow OUT2 to go to the speaker, PB1 of port 
61H must be set to high as well. The following is the code to turn the speaker on, which 
is exactly the same as the IBM BIOS's code to sound the BEEP. 


IN AL, 61H ;GET THE CURRENT SETTING OF PORT B 
MOV AH, AL PSA IEW 

OR AL, 00000011B ;MAKE PBO=1 AND PB1=1 

CU 61H,AL ; TURN THE SPEAKER ON. 

{ HOW LONG THE BEEP SHOULD SOUND GOES HERE} 

MOV AL, AH *;GET THE ORIGINAL SETTING OF PORT B 
OUT 61H, AL ; TURN OFF THE SPEAKER 


The amount of time that a musical note is played is referred to as its time delay 
and is produced with the help of the CPU in the x86 PC. 


Time delay for x86 PCs 


Time delays are often needed for various applications. Using the instructions of 
the x86 CPU to generate the delay is unreliable since the CPU speed varies among the x86 
PCs. For example, the following delay subroutine is dependent on CPU speed, and there- 
fore unacceptable: 


SUB OK, OOK 


GUS LOOP ==Gy 
DEC BL 
JNZ G7 


CHAPTER 13: 8253/54 TIMER 357 


This is the reason that the x86 PC provides a scheme to create a time delay using 
hardware that is CPU speed independent. To create a CPU-independent delay, x86 makes 
PB4 of port 61H toggle every 15.085 microseconds. That means that by monitoring PB4 
of port 61H, a fixed time delay can be obtained, as shown next. Upon entering this sub- 
routine called WAITF, register CX must hold the number of 15.085-microsecond time 
delays needed. 


; (CX) = COUNT OF 157085 MIECROSECONDS 
WAITF PROC NEAR 
PUSH AX 
WALTF1: 
IN AL, 61H 
AND AL,10H ;CHECK TEBA 
CMP AL, AH DED PiU S a eHAN Gre 
JE WAITF1 ;WAIT FOR CHANGE 
MOV AH, AL ; SAVE THE NEW PB4 STATUS 
LOOP WAITF1 ;CONTINUE UNTIL CX BECOMES 0 
POE AX 
RET 
WAITF ENDP 


Now a time delay of any duration can be created regardless of the x86 CPU fre- 
quency. For example, to create a half-second delay, set CX = 33,144 (33,144 x 15.085 us 
= 1/2 second), and then call the above routine: 


MOV CX, 33144 71/2-second delay 
CALL WAITF 


See Example 13-4. 


Example 13-4 


Using the BIOS WAITF routine, show how to create a 1.5-second time delay. 


Solution: 


Since the 1.5-second delay requires the counter to be set to 99,436 (1.5 s/15.085 us = 99,436) 
and the maximum value of CX is 65,536, the following method is used to generate the 1.5-sec- 


ond delay. 


BEOS 

€X 33144 ;1/2-second delay 
WAITF 

BL 

BACK 


Review Questions 


1. What port addresses are assigned to the 8253/54 timer on the x86 PC motherboard? 
Of the three counters of the 8253/54 timer on the PC motherboard, which one is 
used for the speaker, and what port address belongs to it? 

3. True or false. In the PC, counters 0 and 1 are used for internal system use. 

4. True or false. While the user can program counter 2, users cannot program counters 
0 and 1 since they are for system use only. 


eee 
358 


5. True or false. In the PC, while GATEO and GATE are high permanently, GATE2 
can be controlled by the user. 
6. Inthe PC, how is GATE2 controlled by the user? 
7. Find the time delay generated by the following code using the method of monitor- 
ing PB4 of port 61H in x86 PCs and compatibles. 
MOV DL, 200 


BACK: MOV OX G DS 12 ;delay=16572 x 15.085 microsec 
WAIT: IN AL, 61H 

AND AL, 10H ;check PB4 

CMP AL, AH 7aqid it just change? 

JE WAIT ;wait for change 

MOV AH, AL ; save the new PB4 status 

LOOP WAIT ;continue until CX becomes 0 

DEC DL 

JINZ BACK erya DL is 0 


SECTION 13.3: GENERATING MUSIC ON THE x86 PC 


As mentioned earlier, counter 2 is connected to the speaker and it can be pro- 
grammed to output any frequency that is desired. First, look at the list of piano notes and 
their frequencies given in Figure 13-5. Since the input frequency to counter 2 is fixed at 
1.1931817 MHz for all x86 PCs, programs for playing music found in this section can run 
on any of them without modification. To play music, the input frequency of 1.1931817 
MHz is divided by the desired output frequency to get the value that must be loaded into 
counter 2. See Examples 13-5 and 13-6. 


Show the values to be loaded into counter 2 in order to have the output frequency for the notes 
(a) D3, (b) A3, and (c) A4. 


Solution: 


From Figure 13-5, notice that the frequency for note D3 is 147. The value that must be loaded 
into counter 2 is 1.1931 MHz divided by 147, which is 8116. Going through this procedure for 
each note gives the following: 


Value Loaded into Counter 2 
Note Frequency Decimal Hex 
D3 147 Hz 8116 1FB4 
A3 220 Hz 5423 I5ZE 
A4 440 Hz alg 0A97 


Now that the values to be loaded into counter 2 are known, the program for getting the speaker 
to sound the notes for a certain duration is shown in Example 13-6. 


CHAPTER 13: 8253/54 TIMER 359 


ao > GQ 


OT 


Figure 13-5. Piano Note Frequencies 


ee 
360 


Example 13-6 


Program counter 2 to play the following notes: D3, A3, A4, for durations of 250, 500, and 500 
ms, respectively. Place a 5-ms silence between each note. 


Solution: This program uses the values calculated in Example 13-5. 


MOV AL, OB6H 


rcontrol byte:counter 2, LSM, MSB, binary 


OUT 43H, AL poecnds the control byte to control reg 
; load the counter 2 value for D3 and play if coia 250 ms 
MOV AX, 1FB4H Parona IDs! imeire 
OUT 42H,AL ;the low byte 
MOV AL, AH 
OUT 42H,AL ;the high byte 
turn the speaker on 
IN AL, 61H ¿get the current setting of port B 
MOV AH, AL ;save it 
OR AL,00000011B;make PBO = 1 and PBI = 1 
OUT 61H, AL ;turn the speaker on 
CALL DELAY ;play this note for 250 ms 
MOV AL, AH rget the original setting of port B 
OUT Sieh Ade ;turn off the speaker 


CALL DELAY OFF ;speaker off for this duration 
; Load the counter 2 value for A3 and player or FOO) ans 


MOV AX,152FH Aor AS note 
OUT 42H,AL ;the low byte 
MOV AL, AH 
OUT 42H,AL ;the high byte 
;turn the speaker on 
IN AL, 61H get the current setting of port B 
MOV AH, AL ¡save it 
OR AL,00000011B;make PBO = 1 and PB1 = 1 
OUT 61H,AL ;turn the speaker on 
CALL DELAY ;play for 250 ms 
CALL DELAY ;play for another 250 ms 
MOV AL, AH rack the original setting of port B 
OUT 6H, Ai, ;turn off the speaker 


CALL DEEAY OFF PSpeaker Or LOmernts duracion 
;load the counter 2 value for A4 and play it for 500 ms 


MOV AX, OA97H ;for A4 note 
OUT 42H,AL ;the low byte 
MOV  AL,AH 
CUT 42H,AL ;the high byte 
;turn the speaker on 
IN AL, 61H 7oee Che curnene Setting Of port B 
MOV AH, AL ;save it 
OR AL, 00000011B;make PBO = 1 and PB1 = 1 
OWE 61H,AL ¿turn the speaker on 
CALL DELAY splay for 250 ms 
MOV AL, AH 7Pgee the Original setting Of port B 
OUT 61H, AL ¿turn off the speaker 


CALL BEEAY “OFT spaker off for this duration 


CHAPTER 13: 8253/54 TIMER 361 


earimabe: 


ms 


For a delay of 250 ms in x86 PCs, the following routine can be used. 


250 ms delay for x86 PCs 


DELAY PROC 
MOV 


PUSH 
WAIT: 

IN 
AND 
CMP 
JE 
MOV 
LOOP 


POE 
RET 
DELAY ENDP 


NEAR 


CX 165.78 


AX 


AL, 61H 
AL, 10H 
AL, AH 
WAIT 
AH, AL 
WAIT 


AX 


“16578 x 15.08 micrcsec ~ 250 


;check PB4 

did it just change? 

;wait for change 

;save the new PB4 status 
;decrement CX and continue 
until CX becomes 0 


A delay of 5 ms between notes can be achieved in the same way. 


DELAY OFF PROC 
MOV 
PUSH 

WAIT: IN 
AND 
CMP 
JE 
MOV 
LOOP 
POP 
RET 

DELAY OFF ENDP 


NEAR 
(Os, SSI 
AX 

AL, 61H 
AL, 10H 
AL, AH 
WAIT 
AH, AL 
WAIT 
AX 


Playing “Happy Birthday" on the PC 


362 


331 x 15.083 imvecoseec = 8 ims 


;check PB4 

;did it just change? 

;wait for change 

;save the new PB4 status 
continue until CX becomes 0 


This background should be sufficient to develop a program to play any song. The 
tune for the song "Happy Birthday" is given below. 


Lyrics Notes Freq. (Hz) 


hap C4 262 
py C4 262 
birth D4 294 
day C4 262 
to F4 349 
you E4 330 
hap C4 262 
py C4 262 
birth D4 294 
day C4 262 


Duration 


1/2 
1/2 


so D4 294 3 


hap B4b 466 1/2 
py B4b 466 1/2 
birth A4 440 1 
day F4 349 1 
to G4 392 1 
you F4 349 2 


The following program plays the first seven notes of the “Happy Birthday” song 
on Windows NT/2000/XP. Notice we are using arrays for notes and duration. This pro- 
gram was tested using MASM. 


;Tested by Danny Causey and Hanani Bonda 
;This program plays the first 7 notes of "Happy Birthday" song 
-MODEL SMALL 


-STACK 64 

. DATA 
NOTES DW 11CAH, 11CAH, OFDAH, 11CAH, OD5BH, OF1FH, 11CAH 
Duration DB Dj ap ha 8,2 

CODE, 


START PROC FAR 
MOV AX, @DATA 


MOV DS, AX 

MOV AL, OB6H rcontrol byte: counter 2, LSB, MSB, binary 
OUT 43H,AL Asendi the control byte towcontrol reg 
MOV BX,7 7set up counter 


LEA SI,NOTES 
LEA DI,Duration 


AGAIN:MOV Ax,[ SI] ,Sset up pointer 
OUT 42RA AL 
MOV AL,AH 


OUT 42H,AL 
MOV ODL,[ DI] 


CALL SpkON 

UNET ST ¿increments NOTES pointer 
BNE SI 

WET DI 

DEC BX ; decrements the counter 


JNZ AGAIN ;if CX is not zero run the loop until CX=0 
MOV AH, 4CH 


ICI" Abel 
START ENDP 
SpkON PROC 
IN AL, 61H Poel elie recurren Setena Of port B 
MOV AH, AL ;Save it 
OR AL, OOO0O00011B;make PBO = 1 and PB1 = 1 
OUT 61H,AL ;turn the speaker on 
BACK: CALL DELAY splay this note for 250 ms 
DEC DL 
CME DL, 00 
JNE BACK 
MOV AL,AH ¿get the original setting of port B 
OUT 61H,AL ;turn off the speaker 
CALL (DELAY ae, speaker off “for thas duration 
RET 
SpkON ENDP 


CHAPTER 13: 8253/54 TIMER 363 


DE GAY (Ob EER @ © NEAR 


MOV Cx Soul 75 ms 
PUSH AX 
WAITO: IN AL, 61H 
AN DAL, 10H ;check PB4 
CMP AL, AH ;did it just change? 
JE WAITO ‘wait for change 
MOV AH,AL ; save the new PB4 status 
LOOP WAITO ;continue until CX becomes 0 
POP AX 
RET 


DELAY OFF ENDP 


DELAY PROC NEAR 


MOV Cx, SIULSDE 7250 ms 
PUSH AX 
WAIT1:IN AL, 61H 
AND AL, 10H ;check PB4 
CMP AL, AH ;did it just change? 
JE WAIT1 ;wait for change 
MOV AH, AL ;save the new PB4 status 
LOOP WAIT1 ;continue until CX becomes 0 
BOR AX 
RET 
DELAY ENDP 


END START 


In all examples concerning counter 2, values loaded into that counter were calcu- 
lated by dividing 1.1931817, the input to CLK2, by the desired OUT2 frequency. One can 
use the x86 to do the calculation as well, by loading 1.1931817 MHz into registers 
DX:AX and then dividing it by the desired output frequency using the DIV instruction. 


Generating music using C# 


The x86 compilers restrict access to I/O ports using the I/O instructions in 
Assembly language so widely used in the previous progams. This prevents us from gen- 
erating music directly on the timer. However, Microsoft's .NET architecture provides us 
with an interface to achieve the same result as follows: 


//C# Of the first Seven notes of the “Happy Birthday” song 
using System; 
using System.Threading; 
namespace Music 
{ 
class Program 


{ 


Static veld Marn(sering |] vargs) 

{ 
Console.Beep (262, 500); //nap for 500 ms 
Thread.Sleep (5); //speaker off 5 ms 
Console.Beep (262, 500); //py for 500 ns 
Thread.Sleep(5); //speaker off 5 ms 
Console.Beep (294, 1000); # Mssacel tere l S 
Thread.Sleep (5) ; //speaker off 5 ms 
Console.Beep(262, 1000); Madam For id: s 
Thread.Sleep (5) ; //speaker off 5 ms 
Console.Beep (349, 1000); PIES store i s 
Thread.Sleep (5) ; //speaker off 5 ms 


eee 
364 


Console.Beep (330, 2000); KeU tor 25 
Thread.Sleep (5); //speaker off 5 ms 
Console.Beep(262, 500); //hap for 500 ms 

} 


Console.Beep(frequency, duration) will play the frequency for the duration in 
milliseconds. Note that the frequencies can range from 37 to 32767. Anything else will 
create an exception. Thread.Sleep(duration) will pause execution for the duration in mil- 
liseconds. Use these functions to generate the rest of the song for practice. 


Review Questions 


1. Find the frequency and the value that must be loaded into the register for counter 2 
to play the following notes. 
(a) C4 (b) D3 (c) E4 (d) F4 

2. Write pseudocode to program counter 2 to play a note. 

3. Of the steps in Question 2, which must the x86 be involved in, and why? 


PROBLEMS 


SECTION 13.1: 8253/54 TIMER 
Note: Problems 1—10 are not necessarily x86 PC compatible. 


l. True or false. Each of the 8253/54 counters must be programmed independently. 

2. CLK of the 8253/54 is an (input, output) (square, sine) wave. 

3. Design the decoder for the 8253/54, where A7—A2 = 0010 11 is used to activate 
CS. Use NAND and inverters only. Give the port address for each port of this 
design. 

4. Which of the following addresses cannot be assigned to counter 0 of the 8253/54, 
and why? 23H, 54H, 97H, 51H, FCH, 59H 

5. Give the highest number by which a single counter of the 8253/54 can divide the 
input frequency, and what value is loaded into the counter. Give your answer for 
both binary and BCD options. 

6. True or false. If the divisor is larger than 255, we must send the low byte first, then 
the high byte to the counter. 

7. Find the control word to program counter | for mode 3, binary count, low byte 
first, followed by high byte. 

8. Write a program for Problem 7 if CLK1 = 1.6 MHz and OUT! = 1200 Hz. Use the 
port addresses in Problem 3. 

. Repeat Problem 8 for OUT1 = 250 Hz. 

10. In Problem 8, what would be the OUT1 frequency if it is programmed for the max- 

imum divisor? What if the maximum divisor BCD option were used? 


SECTION 13.2: x86 PC 8253/54 TIMER CONNECTION AND PROGRAMMING 


11. State the CLK frequency of all three counters of the 8253/54 in the IBM PC. 

12. State the source of CLK in Problem 11. 

13. What port addresses are assigned to the 8253/54 in the PC? Can they be changed? 

14. State the function of each counter in the 8253/54 of the PC. 

15. True or false. A PC user can program counter 2 only, and should not program 
counters 0 and 1. 


CHAPTER 13: 8253/54 TIMER 365 


16. State the status of the GATE input for each of the counters |Lyrics Note Freq(Hz) Length 
Mar E4 330 


of 8253/54 in the PC. 

17. Why is a time delay based on the microprocessor's instruc- 
tion clock count not widely used? 

18. Write a program to generate a 10-second delay using a 
fixed hardware delay. 

SECTION 13.3: GENERATING MUSIC ON THE x86 PC 


y 
had C4 262 


19. To generate the following notes, state the value pro- 
grammed into the divisor of counter 2 in the PC. A3, G5, 
B6 

20. Write a program to play the song "Mary Had a Little 
Lamb," shown at the right. 


ANSWERS TO REVIEW QUESTIONS 


SECTION 13.1: 8253/54 TIMER 


True 

Divide 

60H is the base address and 63H is the address for the con- 
trol register. 

B3H 

False; it must be sent to the counter 1 register. 

The port address of 61H 

DO = 0 since the maximum BCD number is 10,000 but the 
binary (hex) option goes as high as 65,536. 

8. 65,536 


WN 
e =e oe eme oeme e e ec e Nio NO e e e e pe 


“Mary Had a Little Lamb” 


aS 


SECTION 13.2: x86 PC 8253/54 TIMER CONNECTION AND PROGRAMMING 


40H, 41H, 42H, and 43H 

Counter 2 at port 42H 

True 

True 

True 

Using PBO of port address 61H 

Monitoring of PB4 of port address 61 provides us 16,572 x 15.085 us = 0.25 s 
200 x 0.25 s = 50 s. 


A A 5 


SECTION 13.3: GENERATING MUSIC ON THE x86 PC 


1. Since CLK2 = 1.193187 MHz, we must divide this input frequency by the desired 
OUT2 frequency of each note to get the value to be loaded into counter 2. Therefore, 
we have: (a) 262, 4554 (b) 147, 8116 (c) 330, 3616 (d) 349, 3419 
2. The sequence is as follows: 
(a) Load the control byte for the 8253/54. 
(b) Load the divisor into port 42H. 
(c) Get the status of port 61 and save it. 
(d) Turn the speaker on by setting high both PBO and PB1. 
(e) Let the 8253/54 play the note. 
(f) Use the x86 to generate a time delay for the duration of the note. 
(g) Turn off the speaker by restoring the original status of port 61. 
3. All of them except step (e) since playing of the notes is performed by the 8253/54, 
independent of the x86. 


eee 
366 


CHAPTER 14 


INTERRUPTS IN x86 PC 


OBJECTIVES 


Upon completion of this chapter, you will be able to: 


>> Explain how the x86 PC executes interrupts by using the interrupt 
vector table and interrupt routines 

>> List the differences between interrupts and CALL instructions 

>> Describe the differences between hardware and software interrupts 

>> Examine the ISR for any interrupt, given its interrupt number 

>> Describe the function of each pin of the 8259 programmable interrupt 
controller (PIC) chip 

>> Explain the purpose of each of the four centrol words of the 8259 and 
demonstrate how they are programmed 

>> Examine the interrupts in x86 PCs 


367 


This chapter examines the interrupts in x86 PCs. We also discuss sources of hard- 
ware interrupts in the x86 PC. In Section 14.1 we discusses the concept of interrupts in 
the 8088/86 CPU, then in Section 14.2 we look at the interrupt assignment of the original 
IBM PC. Section 14.3 examines the 8259 interrupt controller chip in detail. Use of the 
8259 chip in the x86 PC is discussed in Section 14.4. In Section 14.5, hardware interrupts 
and interrupt assignments in the x86 PC are discussed. 


SECTION 14.1: 8088/86 INTERRUPTS 


An interrupt is an external event 
that informs the CPU that a device needs 
its service. In the 8088/86 there are a 
total of 256 interrupts: INT 00, INT 01, 
... » INT FF (sometimes called TYPEs). 


Table 14-1: Interrupt Vector 


INT Physical | Logical 
Number | Address | Address 


When an interrupt is executed, the 
microprocessor automatically saves the 
flag register (FR), the instruction pointer 
(IP), and the code segment register (CS) 
on the stack, and goes to a fixed memory 
location. In x86 PCs, the memory loca- 
tion to which an interrupt goes is always 
four times the value of the interrupt num- 
ber. For example, INT 03 will go to 
address 0000CH (4 x 3 = 12 = OCH). 
Table 14-1 is a partial list of the interrupt 
vector table. 


ee eee eee 


For every interrupt there must be a program associated with it. When an interrupt 
is invoked it is asked to run a program to perform a certain service. This program is com- 
monly referred to as an interrupt service routine (ISR). The interrupt service routine is 
also called the interrupt handler. When an interrupt is invoked, the CPU runs the interrupt 
service routine. Now the question is, where is the address of the interrupt service routine? 
As can be seen from Table 14-1, for every interrupt there are allocated four bytes of mem- 
ory in the interrupt vector table. Two bytes are for the IP and the other two are for the CS 
of the ISR. These four memory locations provide the addresses of the interrupt service 
routine for which the interrupt was invoked. Thus the lowest 1024 bytes (256 x 4 = 1024) 
of memory space are set aside for the interrupt vector table and must not be used for any 
other function. Figure 14-1 provides a list of interrupts and their designated functions as 
defined by Intel Corporation. 


Example 14-1 


Find the physical and logical addresses in the interrupt vector table associated with: 
(a) INT 12H (b) INT 8 


Interrupt service routine (ISR) 


Solution: 


(a) The physical addresses for INT 12H are 00048H-0004BH since (4 x 12H = 48H). That 
means that the physical memory locations 48H, 49H, 4AH, and 4BH are set aside for the CS 


and IP of the ISR belonging to INT 12H. The logical address is 0000:0048H—0000:004BH. 
(b) For INT 8, we have 8 x 4 = 32 = 20H; therefore, memory addresses 00020H, 
00021H, 00022H, and 00023H in the interrupt vector table hold the CS:IP of the INT 8 ISR. 
The logical address is 0000:0020H—0000:0023H. 


368 


} INT FF 


0003FC 


} INT 06 


} INT 05 


} INT 04 signed number overflow 


} INT 03 breakpoint 


} INT 02 NMI 


} INT 01 signed-step 


} INT OO divide error 


Figure 14-1. Intel's List of Designated Interrupts for the 8088/86 
Difference between INT and CALL instructions 


If the INT instruction saves the CS:IP of the following instruction and jumps indi- 


rectly to the subroutine associated with the interrupt, what is the difference between that 
and a CALL FAR instruction, which also saves the CS:IP and jumps to the desired sub- 
routine (procedure)? The differences can be summarized as follows: 


l. 


A "CALL FAR" instruction can jump to any location within the 1-megabyte 
address range of the 8088/86 CPU, but "INT nn" goes to a fixed memory location 
in the interrupt vector table to get the address of the interrupt service routine. 

A "CALL FAR" instruction is used by the programmer in the sequence of instruc- 
tions in the program but an externally activated hardware interrupt can come in at 
any time, requesting the attention of the CPU. 

A "CALL FAR" instruction cannot be masked (disabled), but "INT nn" belonging 
to externally activated hardware interrupts can be masked. This is discussed in a 
later section. 

A "CALL FAR" instruction automatically saves only CS:IP of the next instruction 
on the stack, while "INT nn" saves FR (flag register) in addition to CS:IP of the 
next instruction. 

At the end of the subroutine that has been called by the "CALL FAR" instruction, 
the RETF (return FAR) is the last instruction, whereas the last instruction in the 
interrupt service routine (ISR) for "INT nn" is the instruction IRET (interrupt 
return). The difference is that RETF pops CS and IP off the stack but the IRET 
pops off the FR (flag register) in addition to CS and IP. 


CHAPTER 14: INTERRUPTS IN x86 PC 369 


Categories of interrupts 


"INT nn" is a 2-byte instruction where the first byte is for the opcode and the sec- 
ond byte is the interrupt number. This means that we can have a maximum of 256 (INT 
00 INT FFH) interrupts. Of these 256 interrupts, some are used for software interrupts 
and some are for hardware interrupts. 


Hardware interrupts 


As we saw in Chapters 9 and 10, there are three pins in the x86 that are associat- 
ed with hardware interrupts. They are INTR (interrupt request), NMI (nonmaskable inter- 
rupt), and INTA (interrupt acknowledge). The use of INTA will be discussed in Section 
14.3. INTR is an input signal into the CPU, which can be masked (ignored) and unmasked 
through the use of instructions CLI and STI. However, NMI, which is also an input sig- 
nal into the CPU, cannot be masked and unmasked using instructions CLI and STI, and 
for this reason it is called a nonmaskable interrupt. INTR and NMI are activated external- 
ly by putting 5 V on the pins of NMI and INTR of the x86 microprocessor. When either 
of these interrupts is activated, the x86 finishes the instruction that it is executing, pushes 
FR and the CS:IP of the next instruction onto the stack, then jumps to a fixed location in 
the interrupt vector table and fetches the CS:IP for the interrupt service routine (ISR) asso- 
ciated with that interrupt. At the end of the ISR, the IRET instruction causes the CPU to 
get (pop) back its original FR and CS:IP from the stack, thereby forcing the CPU to con- 
tinue at the instruction where it left off when the interrupt came in. 

Intel has embedded "INT 02" into the x86 microprocessor to be used only for 
NMI. Whenever the NMI pin is activated, the CPU will go to memory location 00008 to 
get the address (CS:IP) of the interrupt service routine (ISR) associated with NMI. 
Memory locations 00008, 00009, 0000A, and 0000B contain the 4 bytes of CS:IP of the 
ISR belonging to NMI. In contrast, this is not the case for the other hardware pin, INTR. 
There is no specific location in the vector table assigned to INTR. The reason is that INTR 
is used to expand the number of hardware interrupts and should be allowed to use any 
"INT nn" that has not been previously assigned. The 8259 programmable interrupt con- 
troller (PIC) chip can be connected to INTR to expand the number of hardware interrupts 
to 64. In the case of the IBM PC, one Intel 8259 PIC chip is used to add a total of 8 hard- 
ware interrupts to the microprocessor. IBM PC AT, PS/2 80286, 80386, 80486, and Intel 
Pentium computers use two 8259 chips to allow up to 16 hardware interrupts. The design 
of hardware interrupts and the use of the 8259 in the IBM PC are covered in Sections 14.3 
and 14.4, while ISA bus interrupts are covered in Section 14.5. 


Software interrupts 


If an ISR is called upon as a result of the execution of an x86 instruction such as 
"INT nn", it is referred to as a software interrupt since it was invoked from software, not 
from external hardware. Examples of such interrupts are DOS "INT 21H" function calls 
and video interrupts "INT 10H", which were covered in Chapter 4. These interrupts can 
be invoked in the sequence of code just like a CALL or any other x86 instruction. Many 
of the interrupts in this category are used by the MS DOS operating system and IBM 
BIOS to perform essential tasks that every computer must provide to the system and the 
user. Within this group of interrupts there are also some predefined functions associated 
with some of the interrupts. They are "INT 00" (divide error), "INT 01" (single step), "INT 
03" (breakpoint), and "INT 04" (signed number overflow). Each is described below. These 
interrupts are shown in Figure 14-1. Aside from "INT 00" to "INT 04", which have pre- 
defined functions, the rest of the interrupts from "INT 05" to "INT FE" can be used to 
implement either software or hardware interrupts. 


Interrupts and the flag register 


Among bits DO to D15 of the flag register, there are two bits that are associated 
with the interrupt: D9, or IF (interrupt enable flag), and D8, or TF (trap or single step flag). 
In addition, OF (overflow flag) can be used by the interrupt. See Figure 14-2. 


eee 
370 


The 16 bits of the flag register: 


15 14131211109 876543210 


eae ice eae ae ele feel 


reserved sign flag 
undefined [P zero flag 

overflow flag auxiliary carry flag 
direction flag parity flag 
interrupt flag carry flag 

trap flag 


Figure 14-2. Flag Register 

The interrupt flag is used to mask (ignore) any hardware interrupt that may come 
in from the INTR pin. If IF = 0, all hardware interrupt requests through INTR are ignored. 
This has no effect on interrupts coming from the NMI pin or "INT nn" instructions. The 
instruction CLI (clear interrupt flag) will make IF = 0. To allow interrupt requests through 
the INTR pin, this flag must be set to one (IF = 1). The STI (set interrupt flag) instruction 
can be used to set IF to 1. Section 14.3 will show how to use STI and CLI to mask or allow 
interrupts through the INTR pin. The trap flag (TF) is explained below in the discussion 
of "INT 01", the single step interrupt. 


Processing interrupts 


When the 8088/86 processes any interrupt (software or hardware), it goes through 
the following steps: 


1. The flag register (FR) is pushed onto the stack and SP is decremented by 2, since 
FR is a 2-byte register. 

2. IF (interrupt enable flag) and TF (trap flag) are both cleared (IF = 0 and TF = 0). 

This masks (causes the system to ignore) interrupt requests from the INTR pin and 

disables single stepping while the CPU is executing the interrupt service routine. 

Depending on the nature of the interrupt procedure, a programmer can unmask the 

INTR pin by the STI instruction. 

The current CS is pushed onto the stack and SP is decremented by 2. 

The current IP is pushed onto the stack and SP is decremented by 2. 

5. The INT number (type) is multiplied by 4 to get the physical address of the loca- 
tion within the vector table to fetch the CS and IP of the interrupt service routine. 

6. From the new CS:IP, the CPU starts to fetch and execute instructions belonging to 
the ISR program. 

7. The last instruction of the interrupt service routine must be IRET, to get IP, CS, and 
FR back from the stack and make the CPU run the code where it left off. 


RIN 


Functions associated with INT 00 to INT 04 


As mentioned earlier, interrupts INT 00 to INT 04 have predefined tasks (func- 
tions) and cannot be used in any other way. The function of each is described next. 


INT 00 (divide error) 


This interrupt belongs to the category of interrupts referred to as conditional or 
exception interrupts. Internally, they are invoked by the microprocessor whenever there 
are conditions (exceptions) that the CPU is unable to handle. One such situation is an 


CHAPTER 14: INTERRUPTS IN x86 PC 371 


attempt to divide a number by zero. Since the result of dividing a number by zero is unde- 
fined, and the CPU has no way of handling such a result, it automatically invokes the 
divide error exception interrupt. In the 8088/86 microprocessor, out of 256 interrupts, Intel 
has set aside only INT 0 for the exception interrupt. There are many more exception han- 
dling interrupts in x86 CPUs, which are discussed in Section 14.5. INT 00 is invoked by 
the microprocessor whenever there is an attempt to divide a number by zero. In the x86 
PC, the service subroutine for this interrupt is responsible for displaying the message 
"DIVIDE ERROR" on the screen if a program such as the following is executed: 


MOV AL, 92 oO 
SUB Cip (C1 CESI 
DIV OE ;92/0=undefined result 


INT 0 is also invoked if the quotient is too large to fit into the assigned register 
when executing a DIV instruction. Look at the following case: 


MOV AX, OFFFFH 7; AX=FFFFH 

MOV BIM 2 ;BL=2 

DIV BL 765535/2 = 32767 larger than 255 
7;Maximum capacity of AL 


Put INT 3 at the end of the above two programs in DEBUG and see the reaction 
of the PC. For further discussion of divide error interrupts due to an oversized quotient, 
see Chapter 3. 


INT 01 (single step) 


In executing a sequence of instructions, there is a need to examine the contents of 
the CPU's registers and system memory. This is often done by executing the program one 
instruction at a time and then inspecting registers and memory. This is commonly referred 
to as single-stepping, or performing a trace. Intel has designated INT 01 specifically for 
implementation of single-stepping. To single-step, the trap flag (TF), D8 of the flag reg- 
ister, must be set to 1. Then after execution of each instruction, the 8088/86 automatical- 
ly jumps to physical location 00004 to fetch the 4 bytes for CS:IP of the interrupt service 
routine, whose job is, among other things, to dump the registers onto the screen. Now the 
question is, how is the trap flag set or reset? Although Intel has not provided any specific 
instruction for this purpose (unlike IF, which uses STI and CLI instructions to set or reset), 
one can write a simple program to do that. The following shows how to make TF = 0: 


PUSHF 

POP AX 

AND PS AAAI O TL aL aL aL aL a 138 
PUSH AX 

BORE 


Recall that TF is D8 of the flag register. The analysis of the above two programs 
is left to the reader. To make TF = 1, one simply uses the OR instruction in place of the 
AND instruction above. 


INT 02 (nonmaskable interrupt) 


All Intel x86 microprocessors have a pin designated NMI. It is an active-high 
input. Intel has set aside INT 2 for the NMI interrupt. Whenever the NMI pin of the x86 
is activated by a high (5 V) signal, the CPU jumps to physical memory location 00008 to 
fetch the CS:IP of the interrupt service routine associated with NMI. Section 14.4 contains 
a detailed discussion of its purpose and application. 


372 


INT 03 (breakpoint) 


To allow implementation of breakpoints in software engineering, Intel has set 
aside INT 03 solely for that purpose. Whereas in single-step mode, one can inspect the 
CPU and system memory after the execution of each instruction, a breakpoint is used to 
examine the CPU and memory after the execution of a group of instructions. One inter- 
esting point about INT 3 is the fact that it is a 1-byte instruction. This is in contrast to all 
other interrupt instructions of the form "INT nn", which are 2-byte instructions. 


INT 04 (signed number overflow) 


This interrupt is invoked by a signed number overflow condition. There is an 
instruction associated with this, INTO (interrupt on overflow). For a detailed discussion 
of signed number overflow, see Chapter 6. If the instruction INTO is placed after a signed 
number arithmetic or logic operation such as IMUL or ADD, the CPU will activate INT 
04 if OF = 1. In cases where OF = 0, the INTO instruction is not executed but is bypassed 
and acts as a NOP (no operation) instruction. To understand that, look at the following 
example. 


MOV AL, DATAL 

MOV BL, DATA2 

ADD ALC BL add BL to AL 
INTO 


Suppose in the above program that DATA1 = +64 = 0100 0000 and DATA2 = +64 
= 0100 0000. The INTO instruction will be executed and the 8088/86 will jump to phys- 
ical location 00010H, the memory location associated with INT 04. The carry from D6 to 
D7 causes the overflow flag to become 1. 


+ 64 0100 0000 
+ + 64 01000000 
+128 10000000 OF=1 and the result is not +128 


The above incorrect result causes OF to be set to 1. INTO causes the CPU to per- 
form "INT 4" and jump to physical location 00010H of the vector table to get the CS:IP 
of the service routine. Suppose that the data in the above program was DATA1 = +64 and 
DATA2 = +17. In that case, OF would become 0; the INTO is not executed and acts sim- 
ply as a NOP (no operation) instruction. 


Review Questions 


1. True or false. When any interrupt (software or hardware) is activated, the CPU 
jumps to a fixed and unique address. 

2. There are bytes of memory in the interrupt vector table for each "INT nn". 

3. How many K bytes of memory are used by the interrupt vector table, and what are 
the beginning and ending addresses of the table? 

4. The program associated with an interrupt is referred to as 

5. What is the function of the interrupt vector table? 

6. What physical memory locations in the interrupt vector table hold the CS:IP of INT 
10H? 

7. The 8088/86 has assigned INT 2 to NMI. Can that be changed? 

8. Which interrupt is assigned to divide error exception handling? 


Ee eee ——EEEEEEEEEEEEE>EEE———————= 
CHAPTER 14: INTERRUPTS IN x86 PC 373 


SECTION 14.2: x86 PC AND INTERRUPT ASSIGNMENT 


Of the 256 possible interrupts in the x86, some are used by the PC peripheral 
hardware (BIOS), some are used by the Microsoft operating system, and the rest are avail- 
able for programmers of software applications. Table 14-2 lists many of the PC interrupts. 
Example 14-2 shows the physical address as it relates to the logical address. 


Example 14-2 


For a given ISR, the logical address is F000:FF53. Verify that the physical address is FFF53H. 


Solution: 


Since the logical address is F000:FF53, this means that CS = FOOOH and IP = FF53H. Shifting 
left the segment register one hex digit and adding it to the offset gives the physical address 
FFF53H. 


Examining the interrupt vector table of your PC 


Example 14-3 shows how to use DEBUG's dump command to examine the inter- 
rupt vector table of a x86 PC, regardless of which x86 CPU it contains. 

From the CS:IP address of the ISR, it is possible to determine which source pro- 
vides the service: DOS or BIOS. This is shown in Example 14-4. 


Analyzing an x86 PC BIOS interrupt service routine 
To understand the structure of an ISR, we examine the interrupt service routine of 


INT 12H from IBM BIOS. The interrupt 12H service is available on any PC with an x86 
microprocessor. 


INT 12H: checking the size of RAM on the IBM PC 
IBM PC BIOS uses INT 12H to provide the amount of installed conventional (0 


to 640K bytes) RAM memory on the system. By system is meant both the motherboard 
and expansion boards. One of the functions of the BIOS POST (power on self test) is to 


Example 14-3 l 


(a) Use the DEBUG dump command to dump the contents of memory locations 00000-0002FH. 
(b) Find the CS:IP of divide error, NMI, and INT 8. 


Solution: 
(a) It is very possible that the data you get on your PC will be different from the following dump, 
depending on the OS version and the BIOS chip date of your PC. 

C>debug 

-D 0000:0000-002F 

0000:0000 BS 56 2B 02 56 07 70 00-C3 E2 00 FO 56 07 70 00 .V+.V.p.....V.p. 
0000:0010 56 07 70 00 54 FF 00 FO-47 FF 00 FO 47 FF 00 FO V.p.T...G...G... 
0000:0020 A5 FE 00 FO 87 E9 00 FO-DD E6 00 FO DD E6 00 FD ................ 

(b) For the divide error interrupt (INT 0), CS:IP is located at addresses 0, 1, 2, 3. Remember that 
because of the little endian convention, the low address has the low value; therefore, IP = 56E8H 
and CS = 022BH. By the same token, NMI's INT 2 is located at the vector table addresses of 8, 9, 
A, and B. Therefore, we have IP = E2C3H and CS = FOOOH. The CS:IP of INT 8 ISR is located in 
addresses starting at 00020 since 8 x 4 = 32 = 20H, IP = FEAS, and CS = F000. 


374 


Table 14-2: IBM PC Interrupt System 


oF WN — © 


O 
N 


nmm o OQUE 


Interrupt 


Logical Addr. 


00E3:3072 
0600:08ED 
F000:E2C3 
0600:08E6 
0700:0147 
F000:FF54 


F000:FEA5 
F000:E987 


F000:EF57 
0070:0147 
F000:F065 
F000:F84D 
F000:F841 
F000:EC59 
F000:E739 
F000:F859 
F000:E82E 
F000:EFD2 
F600:0000 
FOQO0:E6F2 
FOOO:FE6E 
0070:0140 
FOOO:FF53 
FO00:FOA4 
0000:0522 
00E3:0B07 
PSP:0000 
Relocatable 
PSP:000A 
PSP:000E 
PSP:0012 
Relocatable 
Relocatable 
Relocatable 


Relocatable 


Physical Addr. 


03EA2 
068ED 
FE2C3 
068E6 
07147 
FFF54 


FFEA5 
FE987 


mee 
semen 
eee 


CHAPTER 14: INTERRUPTS IN x86 PC 


Purpose 
Divide error 

Single-step (trace command in DEBUG) 
Nonmaskable interrupt 

Breakpoint 

Signed number arithmetic overflow 

Print screen (BIOS) 

Reserved 

IRQO of 8259 (BIOS timer interrupt ) 
IRQ1 of 8259 (BIOS keyboard interrupt ) 
IRQ2 of 8259 (reserved) 

IRQ3 of 8259 (reserved for serial com 2) 
IRQ4 of 8259 (reserved for serial com 1) 


( 
IRQ5 of 8259 (reserved for hard disk XT ) 
( 


IRQ6 of 8259 (floppy disk) 

IRQ7 of 8259 (parallel printer LPT 1) 
Video !/O (BIOS) 

Equipment configuration check (BIOS) 
Memory size check (BIOS) 

Disk I/O (BIOS) 

RS-232 I/O (BIOS) 

Cassette 1/O (BIOS) 

Keyboard I/O (BIOS) 

Parallel printer I/O (BIOS) 

Load ROM BASIC 

Load boot-strap (BIOS) 
Time-of-day (BIOS) 

Ctri-Brk control (BIOS) 

Timer control 

Video parameters table 

Floppy disk parameters table 


Graphics character table (DOS 3.0 and up) 


DOS program terminate 

DOS function calls 

DOS terminate address 

DOS Ctrl-Brk exit address 

DOS critical error-handling vector 

DOS absolute disk read 

DOS absolute disk write 

DOS terminate but stay resident (TSR) 
Reserved for DOS 

Multiplex interrupt 

Reserved for DOS 

Disk I/O (XT) 

Fixed (hard) disk parameters (XT) 
Reserved for DOS 

User defined 

Expanded memory manager 

Not used 

Reserved for BASIC 

BASIC interpreter 

Not used 


375 


Example 14-4 


Examine the answers for Example 14-3(b) to determine whether DOS or BIOS provides the ISR 
for the divide error and NMI. 


Solution: 


In the case of INT 0 (divide error), the logical address for CS:IP is 022B:56E8. This results in 
a physical address of 07998H for the divide error interrupt service routine. This area of memo- 
ry belongs to DOS, as discussed in Chapter 10. For NMI interrupt 2, we have a logical address 
of CS:IP = F000:E2C3, which corresponds to physical address FE2C3H. This is the BIOS ROM 
area. 


test and count the total K bytes of conventional RAM memory installed on the system and 
write it in memory locations 00413H and 00414H, which have been set aside for this pur- 
pose in the BIOS data area. In Chapter 10, we showed the data area used by BIOS. The 
job of INT 12H is to copy that value from memory locations 00413H and 00414H into AX 
and return. In other words, after executing INT 12H, AX will contain the total K bytes of 
conventional RAM memory on the system. This value is in hex and must be converted to 
decimal to get values of 1 to 640K bytes. The interrupt service routine for INT 12H looks 
as follows in the IBM PC Technical Reference. 


MEM SIZE PROC FAR 
Surat interrupt back on 
PUSHERDS ;save segment 
SUB AX, AX ;set DS = 0 
MOV DS, AX ;for BIOS data area 
MOV AX,[ 0413] ¿conv mem size in 413,414 
POP DS recover segment 
TREM ;return to caller 
MEM SIZE ENDP 


See Examples 14-5 and 14-6. 


Example 14-5 


Execute INT 12H followed by INT 3 (breakpoint) in DEBUG. Verify the memory size. 
Solution: 


C>DEBUG 

-a 

1131:0100 

1131:0102 

1131:0103 

-G 

AX=0280 BxX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 
DS=1132 ES=1132 SS=1132 CS=1132 IP=0102 NV UP EI PL NZ NA PO NC 
1131:0102 CC INT 3 

-Q 


AX = 0280, which is in hex format. Converting it to decimal gives a size of 640K bytes of mem- 
ory installed on this computer. 


376 


Example 14-6 


Use the DEBUG D (dump) command to dump memory locations 0040:0000 to 0040:001FH and 


inspect the contents of locations 0040:13 and 0040:14. Does this match the result of Example 
14-5? 


Solution: 


C>DEBUG 

-D 0040:0000 1F 

0040:0010 6F 94 00 80 02 40 02 00-00 00 22 00 22 00 66 21 o...@....".".£! 
-Q 

Location 13 contains 80 and 14 contains 02; 0280 matches the result of Example 14-5. 


Review Questions 


1. Find the logical address, CS, and IP values for the ISR if it is located at the ROM 
physical address of FFO65H. 

2. In Question 1, if the ISR belongs to INT 10H, find the exact contents of memory 
locations in the interrupt vector table. 

3. Assume that after using the dump command of DEBUG to dump a section of the 
interrupt vector table, we have the following: 
0000:0030 ..4:..5.00...002: -57 EF 00 FF 00 00 00 00 
To what interrupt number does the above dump belong? 

4. For Question 3, show the logical address and exact values for CS and IP of the 
ISR. 

5. Ina given embedded x86 PC, the documentation states that the motherboard comes 
with 512K bytes of RAM. Running the INT 12H shows the value we expect in the 
AX register. 

6. Which one is the last instruction in the ISR of INT 12H: RET or IRET? 


SECTION 14.3: 8259 PROGRAMMABLE INTERRUPT CONTROLLER 


The x86 has only pins INTR and INTA for interrupts, but one can use these two 
pins to expand the number of interrupts. Intel Corporation has provided an IC chip called 
the 8259 programmable interrupt controller (PIC) to make the job of expanding the num- 
ber of hardware interrupts much easier. See Figures 14-3 and 14-4. This section covers the 
8259 IC chip pins and programming options. Note that this section is about the 8259 chip. 
The ports and programs covered in this section are not related to the PC. 


CAS0—-CAS2 


CASO, CAS1, and CAS2 can be used to set up several 8259 chips to expand the 
number of hardware interrupts of the 8088/86 to 64 by cascading the 8259 chips in a mas- 
ter/slave configuration. This section will focus on slave mode. Section 14.5 discusses both 
the master and slave configurations as used in PC/AT type computers. To use the 8259 in 
slave mode, the chip must be programmed and CASO to CAS2 are ignored. 


SP/EN 
SP/EN (slave programming/enable) in buffered mode is an output signal from the 


8259 to_activate the transceiver (EN). In nonbuffered mode it is an input signal into the 
8259, SP = 1 for the master and SP = 0 for the slave. 


Ree ee ee eee ne eee ——EE———————z—z_—=_—————— 


CHAPTER 14: INTERRUPTS IN x86 PC 377 


INT i | 


Pin description 


INT is an output that is con- a e 
nected to INTR of the x86. RE? = 
INTA Ro O 3 26 C NTA 

D7 4 25 [O IR7 

INTA is input to the 8259 D6 C15 24 O IR6 
from INTA of the x86. ae E he = 
IR0—IR7 D3 C18 21 | IR3 

D2 9 20 ENR 

Inputs IRO to IR7 (interrupt ean 9 19 O IRI 
request) are used as hardware inter- bo C111 18 (CJ IRO 
rupts. When a HIGH is put on any caso C 12 A 17 O INT 
interrupt from IRO to IR7, the CAS 1 C SPIEN 
8088/86 will jump to a vector loca- GND = CAS 2 
tion. For each IR there exists a phys- 
ical memory location in the interrupt Bick doam 
vector table. The x86 has 256 hard- 

: +Vcc GND 
ware or software interrupts (INT P 
00-INT FF). interlace 
EIN Interrupt inputs 
8259 control words and ports DIRDI T ees 

The four control words asso- z mee) — = fe 
ciated with the 8259 are ICWI (ini- z — E Ee 
tialization command word), ICW2, _INT — IR5 
ICW3, and ICW4. ICW3 is used in cnan here) ef 
master mode only and is discussed in CAS 0- CAS 2 S 
Section 14.5. As can be seen from sPiEN 1 


the pins of the 8259, there is only 

one address line AO to communicate with 
the chip. Table 14-3 and Example 14-7 
show the values that AO and CS must take Table 14-3: 8259 Initialization 
to initialize the 8259. 


ICW1 (initialization command word 
1) 


Figure 14-3. 8259A Programmable Interrupt 
Controller 


In looking at Table 14-3, the ques- 
tion might arise: How can the 8259 make a 
distinction between ICW2, ICW3, and 
ICW4 when they are sent to the same 
address? This is one of the functions of ICW1. DO, the LSB of ICW1, will tell the 8259 
if it should look for ICW4 or not. In a similar manner, if D1 is high it knows that the sys- 
tem is configured in slave mode and it should not expect any ICW3 in the initialization 
sequence. The initialization sequence must always start with ICW1, followed by ICW2, 


Example 14-7 
Find the address for ICW1—-ICW4 if chip select is activated by A7-A1 = 0010011. 


Solution: 


The above shows 26H to be the port for ICW1 and 27H the port for ICW2, ICW3, and ICW4. 


378 


and finally the last one, if needed. There is no jumping ahead. D2 is always set low (= 0) 
for the x86. D3 chooses between level triggering or edge triggering of the input signals 
IRO-IR7. In edge triggering, a low-to-high input is recognized as an interrupt request. In 
level triggering, a high on the IR is recognized as an interrupt request. D4 must always be 
high. D5, D6, and D7 are all low for the x86 microprocessors (they are used only for the 
8080/85). See Figure 14-5. 


ICW2 (initialization command word 2) 


It is the function of ICW2 to assign interrupt numbers to IRO-IR7. While the 
lower three bits, DO, D1, and D2, vary from 000 to 111, they, along with D3-D7 (T3 
through T7), form the 8-bit INT type number assigned to the corresponding IRO through 
IR7. That means that D3-D7 can only be programmed according to the assignment of the 
INT type, with the lower bits being provided by the 8259, depending on which interrupt 
of IRO to IR7 is activated. See Figure 14-5. 


ICW3 (initialization command word 3) 


ICW3 is used only when two or more 8259s are cascaded. In this mode, a single 
8259 can be connected to eight slave 8259s, thereby providing up to 64 hardware inter- 
rupts. In cascade mode, there are separate ICW3 words for the master and the slave. For 
the master, it indicates which IR has a slave connected to it, and a separate [CW3 informs 
the slave which IR of the master it is connected to. See Figure 14-6. 


ICW4 (initialization command word 4) 


DO indicates the processor mode (PM), the choice of microprocessor. DO equals 
1 for the 8088/86 and 0 for the 8080/8085. When D1, which is AEOI (automatic end of 
interrupt), is high it eliminates the need for an EOI instruction to be present before the 
IRET (interrupt return) instruction in the interrupt service routine. When D1 is zero, the 
EOI must be issued using the OCW (operation command word) to the 8259. In other 
words, if D1 = 0, the last three instructions of the interrupt service routine for IRO-IR7 
must be issuing the EOI followed by IRET. The significance of this is discussed shortly 
when the OCW is discussed. D2 and D3 are for systems where data buses are buffered 
with the use of bidirectional transceivers. The 8259 can work in either buffered or non- 
buffered mode. D4 is for SFNM (special fully nested mode). This mode must be used 
when the 8259 is in master mode, and then D4 = 1; otherwise, it is 0. DS—D7 must be zero, 
as required by the 8259. See Figure 14-6 and Example 14-8. 


Masking and prioritization of |RO-IR7 interrupts 


One might ask what happens if more than one of interrupts IRO-IR7 is activated 
at the same time? Can we mask any of the interrupts? What about responding to another 
interrupt while an interrupt is being serviced? To answer all these questions, the function 
of the OCW (operation command word) must be understood. This is discussed next. 


Caen eee 
CHAPTER 14: INTERRUPTS IN x86 PC 379 


In- Interrupt 
service Priority Request 
Register Resolver Register 
(ISR) (IRR) 


DARAADAD 
NIDAROS 


Internal bus 


Figure 14-4. Partial Block Diagram of the 8259A 


D 


3 D2 D1 DO 
o Jo fo] of 4 [im] o [snc 
=I|ICW4 needed 
0 =no ICW4 needed 
=single 
0 =cascade mode 
Always 0 for the x86 


1 
1 


1 =level trig. input 
0 =edge trig. input 
Always 0 for the x86 


T7 - TO is the interrupt assigned to IRO of the 8259 


Figure 14-5. ICW Formats (ICW1 and ICW2) for the 8259 


aaa 
380 


ICW3 (Master Device) 


AO D7 Bo BS D4+D38 D2 Di DO 


a 


Input has a slave 
0 = IR input does not have 
a slave 


ICW3 (Slave Device) 


AO D7 .8e D5 4 D3.D2 D1 DO 


PACALA 


SLAVE ıp 1 
V2 ASE 
01010101 


00110011 
00001111 


ICW4 


AO D7 D6 D5 D4 D3 D2 D1 DO 


OPAT Ci | ee | | 


1 = for x86 
0 = for 8085 
1 = auto EOI 

0 = normal EOI 


1 = spec. fully 
nested mode 

0 = not sp. fully 

nested mode 


nonbuffered mode 
buffered mode slave 
buffered mode master 


0 X 
1 0 
1 1 


Figure 14-6. ICW Formats (ICW3 and ICW4) for the 8259 


CHAPTER 14: INTERRUPTS IN x86 PC 


Example 14-8 


(a) Find the ICWs of the 8259 if it is used with an 8088/86 CPU, single, level triggering IRs, 
and IRO is assigned "INT 50H". The 8259 is in slave buffered mode with normal EOI. 

(b) Show the program to initialize the 8259 using the port addresses in Example 14-7. 

(c) Find the addresses associated with IRO, IR1, and IR2 in the interrupt vector table. 

Note: This example is not PC-compatible and is given only for an exercise. 


Solution: 
(a) From Figure 14-5, we get the following for each of the ICWs: 


ICW1 

DO = | ICW4 needed 

Di=1 single 

D2=0 this is always zero for x86 CPUs 
D3 = 1 level triggering 

D4= 1 required by the ICW1 itself 

DS = D6 = D7=0 this is always zero for x86 CPUs 


This gives ICW1 = 00011011 = 1BH. To get ICW2, look at Table 14-4. Always equate ICW2 
to the INT # assigned to IRO: ICW2 = 01010000 = 50H. Notice that "INT nn", assigned to IRO, 
can decide only bits D7—D3 (T7—T3 in Figure 14-5) of ICW2. This means that the "INT nn" 
assigned to IRO must have the lower three bits = 000; therefore, it can take either values of XOH 
or X8H, where X is a hex number. For example "INT 45H" cannot be assigned to IRO. 


No ICW3 is needed since it is single and not cascaded. 


ICW4 

DO=1 8088/86 

D1=0 normal (we must issue EOI before IRET instruction) 
D2 = 0; D3 = 1 slave buffered mode 

D4 =0 not nested 

D5 = D6 = D7=0 required by the ICW4 


We get ICW4 = 00001001 = 09H. 
(b) The program is as follows: 


MOV AL, 1BH ; [CW 
OUT 26H, AL AKO) JEIOQURIE Gis 
MOV AL, 50H 7; ICW2 
OUT 27H, AL PAOy IROURMP 20 7/)el 
MOV AL, 09 ; ICW4 
OUT 27H, AL 7 LO VEC Raa 


(c) If "INT 50H" is assigned to IRO, then IR1 and IR2 have "INT 51H" and "INT 52", respec- 
tively, and so on. The vector memory locations associated with the IRs are as follows: 


Vector Location 
IRQ (Pin of 8259) INT Logical Address Physical Address 
IRO 0000:0140H-0143 00140H-00143 
IR1 0000:0144H-0147 00144H-00147 
IR2 0000:0148H-014B 00148H-0014B 
and so on. 


382 


Table 14-4: INT Number for Hardware Interrupts in Example 14-8 


8259 Interrupt 
Input INT Type 


IR2 
| 
INT 55 


Binary Data for ICW2 
D7 D6 D5 D4 D3 D2D1 DO 


i 


3 
4 
5 
id 


R 
R 
R 


oo} O IOo |S OG 
w Jes jes a a jax a a 
© IO © o 19 190S 
L% ja Ja a a a 


© O IS eOe 1e jee 
= j- || | oO o JO jo 
= |- O oO |j- |= o jo 
=- O j- oO j> oO j- oO 


R 


OCW (operation command word) 


After ICW1, ICW2, and ICW4 have been issued in sequence to the 8259 chip in 
order to initialize it, the 8088/86 is ready to receive hardware interrupts through the 8259's 
IR0-IR7 pins. After the process of initialization, the OCW (operation command word) can 
be sent to mask any of IRO-IR7, or change the priority assigned to each IR. There are 
three operation command words: OCW1, OCW2, and OCW3. See Table 14-5. With the 
help of OCWs, a programmer can dynamically change the priority associated with each 
of IRO-IR7, or mask any of them. Example 14-9 shows how the OCWs are sent to the 
8259. 


Table 14-5: Addresses for 8259 OCWs 


Operation Command Word 
ro | 0 locw2,0cw3 


1 | OCW1 


8259 is not addressed 


Figure 14-7 shows the OCWs for the 8259. Below is a discussion of each. Before 
discussing the OCWs, the existence of three registers inside the 8259 must be noted. They 
are the ISR (in-service register), IRR (interrupt request register), and IMR (interrupt mask 
register). See Figure 14-4. 


Example 14-9 
Find the port addresses for the OCWs of the 8259 in Example 14-7. 


Solution: 


From Table 14-5 and Example 14-7: 


A7 A6 A5 A4 A3 A2 Al AO 
0 020° 0 1 0 =26H Port for OCW2 and OCW3 


0 0 =27H Port for OCW! 


eee eee ————————————7= 
CHAPTER 14: INTERRUPTS IN x86 PC 383 


ocw1 

AO D7 D6 D5 D4 D3 e 

M2 
> ars 


o a interrupt Mask 


1 =Mask Set 
0 =Mask Reset 


OCW2 
D4 D3 D2 D1 DO 


IR Level to Be 
Nonspecific EOI command Acted Upon 


Specific EOI command 0d 2 345 67 
Rotate on nonspecific EOI command 

Rotate in automatic EOI mode (set) 

Rotate in automatic EOI mode (clear) 

Rotate on specific EOI command 


Set priority command 


E 
a 
E 
Ei 


No operation 


OCW3 
AO Df D6 D5 D4 D3 


D2 D1 DO 
Co po [ewen] o | 1 | [RR [RIS 


Read Register Commands 


Ona 
K 
No Reset Set 

! Special |Special 
action 


Read 
action |IR reg 


Mask Mask 


1 = Poll Command 
0 = No Poll Command 


Figure 14-7. OCW Format for the 8259 


384 


OCW1 (operation command word 1) 


l OCW1 is used to mask any of IR0-IR7. Logic 1 is for masking (disabling) and 0 
is for unmasking (enabling). For example, 11111000 is the OCW1 to enable (unmask) 
IRO, IR1, IR2, and disable (mask) the rest (IR3 through IR7). When this byte is written to 
the 8259 (by making AO = 1 and CS = low), it goes into the internal register called IMR 
(interrupt mask register). 

There are occasions when one needs to know which IRs are disabled and which 
ones are enabled. In that case, simply read OCW1, which is the contents of IMR. For 
example, to read OCW1 using the port addresses in Example 14-10, code "IN AL,27H". 
By examining the contents of AL, one can find out which IRs are enabled and which ones 
are disabled. 


Example 14-10 


Write the code to unmask (enable) IRO-IR7. Use the ports in Example 14-9. 


Solution: 


To enable IRO—-IR7, use 0 for MO-M7 in OCW1 of Figure 14-7. 
OCW 1 = 0000 0000 = 00H 


MOV AL, 00 7OCW1l to unmask IRO IR7 
OUT 27 RE issue OCW1 to IMR 


OCW2 (operation command word 2) 


This command word is used to assign a specific priority to the IRs. Three meth- 
ods for assigning priority to IRO-IR7 are discussed below. 


Fully nested mode 


This assigns the highest priority to IRO and the lowest to IR7. In this case, if IR3 
and IRS are activated at the same time, first IR3 is served and then IR5. What happens if 
IR3 is being served when both IR2 and IR4 request service? In that case, IR3 is put on 
hold, then IR2 is served. After IR2 is served, IR3 is completed and finally IR4 is served. 
This is the default mode when the 8259 is initialized. The 8259 can be programmed to 
change the default mode to assign the highest priority to any IR. For example, the follow- 
ing shows OCW2 if IR6 has been assigned the highest priority, then IR7 has the next pri- 
ority, and so on. 


IRO IR1 (IR2 IR3 IR4 IR5 IR6 IR7 _ interrupt pin on 8259 


2 3 4 5 6 7 0 1 priority 0 = highest and 7 = lowest 
Automatic rotation mode 


In this scheme, when an IR has been served it will take the lowest priority and will 
not be served until every other request has had a chance. This prevents interrupt starva- 
tion, where one device monopolizes the interrupt service. 


Specific rotation mode 


In this scheme the 8259 can be programmed to make the rotation follow a specif- 
ic sequence rather than IRO to IR7, which is the case for the automatic rotation mode. In 
this mode, the IR served will be stamped as the lowest priority, meaning that it will not be 
served until every other request has had a chance. The only difference between this mode 
and automatic rotation is the sequence of rotation. 

Having concluded this brief description of priority schemes, the following will 


Re nce eee. 
CHAPTER 14: INTERRUPTS IN x86 PC 385 


discuss the OCW2 bits. D2—D0 are used to assign a new priority scheme to an IR other 
than the default. For example, to assign the highest priority to IR6, use D2—D0 = 110. 
D4-D3 must always be 0 for OCW2. D5, EOI (end of interrupt), is used to issue an 
end-of-interrupt command to the 8259. This is a very important function and is widely 
used in IBM BIOS. The following paragraph discusses this at length. D6 (SL, select) and 
D7 (R, rotation) bits are used to program the 8259 for the various priority schemes dis- 
cussed earlier. A frequently used bit combination for OCW2 in the x86 PCs is 0010000 = 
20H. 


Importance of the EOI (end-of-interrupt) command 


Why is it necessary to issue an EOI command to the 8259? To understand the 
answer to this question, consider the following case. Assume that an 8259 has been ini- 
tialized and is in the default fully nested mode (where IRO has the highest priority and IR7 
the lowest). Now assume that IR3 is activated and the CPU acknowledges the interrupt by 
sending back a signal through INTA. Then the CPU goes to the vector table and gets 
CS:IP of the interrupt service routine and starts to execute the routine. When the CPU 
acknowledges IR3, the 8259 marks (sets to 1) the bit associated with IR3 in its ISR (in- 
service register) to indicate that this is being serviced now. Issuing EOI to the 8259 indi- 
cates that servicing of IR3 is now complete and the bit associated with IR3 in register ISR 
can be reset to zero, thereby allowing IR3 to come in again. Of course, the EOI must be 
issued at the end of the service routine; otherwise, IR3 might keep interrupting itself again 
and again. If it is not issued and the CPU goes back to the main program after it finishes 
servicing IR3, it will not be able to be serviced again since the bit in the ISR register indi- 
cates that IR3 is being serviced. The important point of this is that the last three instruc- 
tions of any interrupt service routine for IRO-IR7 must be issuing the EOI, followed by 
IRET (see Example 14-11). It should be noted that while IR3 is being serviced, IR2, IR1, 
and IRO all are allowed to come in and interrupt it since they have higher priority, but no 
lower priority interrupts of IR4-IR7 are responded to. For example, if IR3 is being serv- 
iced and IR1 is activated, only IRO can interrupt, and IR2 and IR4-IR7 will not be 
responded to. If the programmer has failed to issue the EOI at the end of IR1 and IR3, he 
has simply put these two IRs out of circulation in addition to IR2, IR4, IR5, IR6, and IR7. 
Only IRO will be responded to by the 8259, since the ISR has marked IR3 and consequent- 
ly all the lower priority interrupts IR4-IR7. Then IR1 puts the lower priority IR2 out of 
circulation. See Example 14-11. 


Example 14-11 


Show the last three instructions of the interrupt service routine for IR1 of the 8259 in Example 
14-8. 


Solution: 


Before the interrupt service routine returns control to the main program, it issues OCW2 to the 8259. 


INT_SERV PROC FAR ;routine for IRI 


MOV AL, 20H ;the EOI byte for OCW2 
OUT 26H, AL ¡to port designated for OCW2 
TREN return from ISR to the maan program 


OCW3 (operation command word 3) 


OCW3 is used, among other functions, to read the 8259's internal registers IRR 
(interrupt request register) and ISR (in-service register). DO and D1 allow the program to 
read these registers in order to see which of IRO-IR7 is pending for service and which one 
is being served. (As mentioned earlier, OCW1 is used to peek into the IMR.) The rest of 
the bits are for changing the masking mode and other advanced functions of the 8259. 
Interested readers should refer to Intel manuals. 


aaa 
386 


Review Questions 


— 


A single 8259 can add up to hardware interrupts to an x86 CPU. 

2 INTRisan (input, output) signal for the 8259 but it is an 

(input, output) signal for the x86. 

True or false. CASO, CAS1, and CAS2 are used for master/slave mode only. 

4. Indicate the logic level (high or low) on input pins AO and CS needed to send 
ICW1 to the 8259. 

5. True or false. ICWs can be sent to the 8259 in random order. 

6. The 8259 can receive ICWs in the sequence ICW1, ICW2, ICW3, ICW4 or ICW1, 
ICW2, ICW4. How does it know which option is being programmed? 

7. True or false. When the ISR (interrupt service routine) of IRS is being executed, the 
8259 prevents requests from the same interrupt by marking bit IR5 in its in-service 
register (ISR). 

8. True or false. Fully nested mode is the default mode. 

. In fully nested mode, which IR has the highest priority? 

10. Assume that an 8259 is configured in fully nested mode and the CPU is executing 

the interrupt service routine for IR5. During the execution of IRS, which interrupts 

can come in and which ones are blocked? 


ies) 


SECTION 14.4: USE OF THE 8259 CHIP IN x86 PCs 


The original 8088-based PC used only one 8259 chip to extend the number of 
hardware interrupts to eight, but in subsequent PCs, two 8259 chips were used to extend 
the hardware interrupts up to 15. Since the original PC is a subset of today’s x86 PC, we 
discuss that first and examine the rest of interrupts in a later section. 


Interfacing the 3259 to the 8088 in IBM PC 


To interface the 8259 
to the 8088, there must be 
two port addresses assigned 
to the 8259. One is for ICW1 
and the second one is for 
ICW2 and ICW4. 
Figure 14-8 shows the 
address decoding for the 
8259 in the IBM PC. Since 
the chip select is activated by 
Y1 and all the x's for don't 
care must be zero, the 
addresses can be calculated 
in the manner indicated as 
shown in Table 14-6. Figure 14-8. Chip Select Decoder of the 8259A 


Table 14-6: Port Addresses of ICWs and OCWs 


Hex 
Address 


Port address ICW 1 


Port addresses ICW 2,|ICW3,ICW4 
0 P 
1 


ort addresses OCW 2,0CW3 
Port address OCW 1 


CHAPTER 14: INTERRUPTS IN x86 PC 387 


Initialization words of the 8259 in the IBM PC 


Next the IBM PC initialization words for the 8259 will be explained. From earli- 
er discussions and Figure 14-5, the following configuration for the control words ICW1, 
ICW2, and ICW4 can be calculated: 


ICW1 IBM PC Configuration 

DO = 1 ICW4 needed 

D1=1 Single 

D2 =0 For x86 this must be zero 
D3 =0 Edge triggering 

D4 = 1 Required by ICW1 itself 


D5 = D6 = D7 =0 Os for 8088/86 based systems 
ICW1 = 00010011 = 13H 


As was explained in Section 14.2, out of the 256 interrupts of the 8088/86, IBM 
PC designers assigned INT 08-INT OF for expansion of hardware interrupts. These inter- 
rupts are used by IRO—IR7 of the 8259 and commonly referred to as IRQO-IRQ7. INT 08 
is for IRQO, INT 09 is for IRQ1, and so on. It is the function of ICW2 to inform the 8259 
which interrupt numbers are assigned to IRQO-IRQ7. This is done by equating ICW2 of 
the 8259 to the interrupt number assigned to IRQO. In other words, ICW2 is the interrupt 
number for IRO, which in the case of the IBM PC and compatibles is INT 08. The 8259 is 
only programmed for the value of IRQO, so the 8259 generates the INT numbers for IR1 
through IR7. These are listed in Table 14-7. Summarizing the above discussion gives 
ICW2 = 00001000 = 08H. 


Table 14-7: IBM PC Hardware Interrupts 


Binary Data ICW2 8259 Interrupt Ml 
3 D2 D1 DO Input INT Type 


O 
(eS) 


ICW3 is used only when multiple 8259 chips are connected in master/slave mode, 
which is discussed in Section 14.5. Next the ICW4 configuration will be examined. 


ICW4 IBM PC Configuration 

DO = 8088/86 

D0 Normal (issue EOI before IRET) 
D2 = 0, D3 = 1 Slave buffered mode 

D4 =0 Not nested 


D5 = D6 = D7 =0 Required by ICW4 


ICW4 = 00001001 = 09H 


SE E 
388 


This gives the following code for 8259 initialization. 


MOV AL, 13H ;the ICWl 
OUT 20H, AL 
MOV AL, 8 ;the ICW2 
OUT 21H,AL 
MOV Abg S ;the ICw4 


OUT ZITAL 


The IBM PC BIOS version of the above program is as follows: 


LOC OBJ LINE SOURCE 
0020 19 INTAOO EQU 20H 78259 PORT 
OOZ 20 INTAO1 EQU 21H 78259 PORT 

5537 INITIALIZE THE 8259 INTERRUPT CONTROLLER 
CHIP 

554; --------~------------~--------~-----~~-~---------------- 
E1B4 Si ede: 
E1B4 BO13 556 MOV Mili Sisk A TCEWA EDGE, SNGL, TENA 
E1B6 E620 557 OUT INTA00,AL 
E1B8 B008 558 MOV AL, 8 ;SETUP ICW2 INT TYPE 8 (8-F) 
E1BA E621 559 OUTA NTAO1,AL 
E1BC BOO 9 560 MOV AL, 09 ; SETUP ICW3 - BUFFERED, 8086 MODE 
E1BE E621 561 OUT INTAO1,AL 


soa aiene sere a S me nha, 0) 9) mle lineata ie). «ents 


Now that the 8259 is initialized, it is ready to accept an interrupt on any of the 
inputs IRQO-IRQ7, thereby expanding the number of hardware interrupts for the 8088/86. 
What if the 8259 IC is defective? The 8259 is tested by a program in BIOS. There is a col- 
lection of programs in BIOS that is responsible for testing and initialization of the CPU 
and peripheral chips. This is commonly referred to as the POST (power on self test). 

Since it is possible that one of the bits of the IMR (interrupt mask register) has 
become stuck to zero or one during the fabrication of the chip and escaped detection, the 
following program from IBM BIOS tests the IMR of the 8259 chip by writing Os and Is 
to it and reading them back. Reminder: To access the IMR, use OCW1, which has the port 
address 21H in the IBM PC (see Table 14-6). In the following program, if the test fails, 
the system will beep. 


620; TEST THE IMR REGISTER 


621 
E217 BA2100 622 MOV DxX,0021 PIXOMONAU INIR ACHTERRDDRI 21 
E21A B000 623 MOV AL,0 7SET IMR TO ZERO 
E21C EE 624 OUT DxX,AL 
E21D EC 625 IN AL, DX 7;READ IMR 
E21E OACO 626 OR AL,AL ;IMR = 0? 
E220 TLS 627 JNZ D6 7GO TO ERR ROUTINE IF NOT 0 
B222 BOTH 628 MOV AL, OFF ;DISABLE DEVICE INTERRUPTS 
E224 EE 629 OUT DX AL ;WRITE TO IMR 
E225 EC 630 IN AL, DX 7;READ IMR 
E226 0401 631 ADD AL,1 ;ALL IMR BIT ON? 
E228 750D 632 JNZ D6 7 NO GO TO ERR ROUTINE 


er E E r a E E E E 


Sequences of hardware interrupts with the 8259 


When a high is put on any of IR0-IR7, how does the microprocessor become 
involved? As mentioned earlier, INTA of the 8088/86 is connected to INTA of the 8288 


PE 
CHAPTER 14: INTERRUPTS IN x86 PC 389 


bus controller, and INTR of the 8259 is connected to INTR of the 8088/86. The following 


is the sequence of events after an IR of the 8259 is activated. l 
1. After an IR is activated, the 8259 will respond by putting a high on INTR, thereby 


signaling the CPU for an interrupt request. 

2. The 8088/86 puts the appropriate signals on S0, S1, and S2 (SO = 0, S1 = 0, and S2 

= 0), indicating to the 8288 that an interrupt has been requested. 

The 8288 issues the first INTA to the 8259. 

4. The 8259 receives the first INTA and does internal housekeeping, which includes 

resolution of priority (if more than one IR has been activated) and resolution of 

cascading. 

The 8288 issues the second INTA to the 8259. 

6. The second INTA pulse makes the 8259 put a single interrupt vector byte on the 
data bus in which the 8088/86 will latch. The value of the single byte depends on 
ICW2 and which IR has been activated, as discussed earlier. 

7. The 8088/86 uses this byte to calculate the vector location, which is four times the. 
value of the INT type. 

8. The 8088/86 pushes the flag register onto the stack, clears IF (Interrupt Flag) and 
TF (Trap Flag), thereby disabling further external interrupt requests and disabling 
single-step mode, and finally pushes the present CS:IP registers onto the stack. 

9. The 8088/86 reads CS:IP of the interrupt service routine from the vector table and 
begins execution of the interrupt routine. 


Lo 


A 


Next we see which devices in the IBM PC use the eight hardware interrupts, IRQO 
to IRQ7, of the 8259. 


Sources of hardware interrupts in the IBM PC 


With the use of the 8259, the IBM PC has eight interrupts, IRO to IR7, plus NMI 
of the 8088/86. First the assignment of IRO to IR7 will be discussed, then NMI and its use 
in the IBM PC. 

Of the eight interrupts for the 8259, IBM has used two, IRO and IR1, for internal 
use by the system. The other six, IR2 through IR7, are available through the expansion 
slots. Of those used internally, IRO is for channel 0 of the 8253 timer to update the time 
of day (TOD) clock, and IR1 is dedicated to the keyboard. IR1 is activated whenever the 
serial-in, parallel-out shift register of the keyboard has a byte of data. IR2 to IR7 are gen- 
erally used with the following assignments. The following two are used on the mother- 
board: 


INT 08 IRQO Channel 0 of 8253 timer to update TOD 
INT 09 IRQ1 Keyboard input data 


The following are available through the expansion slot bus and used widely in 
industry, as indicated. Figure 14-9 summarizes the hardware interrupt assignment in the 
IBM PC. 


INT OAH = IRQ2 Reserved 

INT OBH IRQ3 Serial COM2 

INT OCH IRQ4 Serial COM1 

INT ODH IRQ5 Alternative printer 
INT OEH IRQ6 Floppy disk 

INT OFH IRQ7 Parallel printer LPT1 


Sources of NMI in the IBM PC 


The last hardware interrupt to be discussed for the 8088/86 computer is the NMI 
(nonmaskable interrupt). This interrupt is actually one of the pins of the CPU (similar to the 
TRAP pin in the 8080/85), and unlike INTR there is no need for the INTA pin to acknowl- 
C 
390 


Enable NMI 
using port AOH 


8087 interrupt request 
motherboard RAM parity check 
I/O channel check 


Timer channel 0 
of the 8253 


keyboard — available 

com 2 

IRQ2 to IRQ7 com 1 
available through parallel 


printer LPT 
expansion slots floppy disk 


parallel 
printer LPT 1 


interrupt from 8087 
SW2 of 


installation switch 1 PCK 


parity check 


enable I/O check 
from PB5 of 8255 


I/O channel check 


on i lon siot O I/O channel check 


to PC6 of 8255 


WRT NMI Reg 
through port 
address AOH 


from RESET of CPU 


Figure 14-10. Sources of NMI in the PC 

edge it. Furthermore, it cannot be masked (disabled) by software as is the case for INTR, 
which can be masked at any time through use of the instruction CLI (clear interrupt flag). It 
is for this reason that the IBM PC has used the NMI for parity bit checking of DRAM to 
make sure that all read/write memory is working properly. In the absence of RAM memory, 
the operating system would not be loaded and the computer could not function. 

If the NMI is so important to the system, which devices can activate it, and can 
they be masked at all? 


DN ccc 


CHAPTER 14: INTERRUPTS IN x86 PC 391 


First, as can be seen from Figure 14-10, there are three sources of activation of 
the NMI: 


1. NPIRQ (numerical processor interrupt request) 
2. Read/write PCK (parity check) 
3. IOCHK (input/output channel check) 

Since three different sources can activate NMI, how does the system know which 
one is requesting interrupt service at any given time? The IBM PC system recognizes 
which of these interrupt requests has been activated by checking input port C of the 8255. 
It looks at PC6 of the 8255 to see if it is IOCHK and at PC7 to see if it is PCK. The NMI 
service routine software must check PC6 and PC7 and determine which one has request- 
ed service. If neither of these two is requesting service, the request must have come from 
the 8087 coprocessor on the motherboard (in IBM terminology, planer). IBM BIOS 
checks the source of each and, as it finds them, displays an appropriate message on the 
video screen. 


The following code shows how BIOS detects the source of the NMI interrupt. 


BACs 746 ORG OE2C3H 
E2CS 747 NMI _ INT PROC NEAR x 
E2C3) 50 748 PUSH AX ; SAVE ORIG CONTENTS OF AX 
E2C4 E462 749 IN AL, PORT C 
E2C6 A8CO 750 TEST AL, OCOH ; PARITY CHECK? 
E2C8 7415 US JZ D14 7;NO, EXIT FROM ROUTINE 
sgonoound oz Soc 7ADDR OF ERROR MSG 
E2CE A840 i538 PEST AL, 40H ; T/O PARITY CHECK 
E2D0 7504 754 JNZ D13 ;DISPLAY ERROR MEG 
owes nen TSS She sob ;MUST BE PLANER 
E2D6 TSG IDSs 

YS) rae E } sends the message to 
Sie : } video and halts the 
System. 
E2DF TOZ TDIA: 
E2DE 58 763 BOE AX ; RESTORE ORIGINAL AX 
E2E0 CF 764 IRET 


765 NMI INT ENDP 
(Reprinted by permission from "IBM BIOS Technica! Reference" c. 1984 by International Business Machines Corporation) 


Is there any way that NMI can be masked? The answer is yes. As can be seen from 
Figure 14-10, NMI is masked by a RESET signal from the CPU with CLR of the D flip- 
flop when the computer is first turned on. It can also be unmasked or masked through port 
AOH by setting D7 of the data bus to 1 (unmask) or 0 (mask). Again from the IBM PC 
BIOS we see the following: 


1261 ; ENABLE NMI INTERRUPTS 
1262 
E5BC B080 1263 MOV AL, 80H ¿ENABLE NMI INTERRUPTS 
ESBE E6A0 1264 OUT OAOH, AL 


aS sree aueien "Me (ale ee Se iim. 6. 8. s.iesrelle) @ 6) ese, © aie 


1. True or false. The original IBM PC used only one 8259. 

2. What ports are assigned to ICWs in the PC? 

3. Inthe PC, the IRQs are (edge, level) triggered. 

4. Of the 256 possible interrupts of the 8088, which ones are assigned to IRQO-IRQ7 
of the 8259? 

5. True or false. IRQO and IRQ1 can be used by the system but not by the user. 


6. Which IRQs of the 8259 are available on the expansion slot? 


7. True or false. The x86 can mask and unmask the NMI by using the STI and CLI 
instructions. 


8. True or false. If there is a problem with the memory of the PC, NMI is activated. 


SECTION 14.5: MORE ON INTERRUPTS IN x86 PCs 


When the first PC was introduced, only the six hardware interrupts, IRQ2-IRQ7, 
were available through the 8-bit section of expansion slot. The other two, IRQO and IRQI, 
were used by the motherboard. With the introduction of the 80286-based PC AT, another 
eight interrupts, IRQ8-IRQI5, were added. IBM implemented the additional hardware 
interrupts with the use of a second 8259 programmable interrupt controller. To make their 
computers IBM compatible, all subsequent x86 PCs have remained faithful to the origi- 
nal IBM PC. In this section we study the hardware interrupt assignment for x86 PCs. 


x86 PC hardware interrupts 


In the design of the second generation of IBM PC, IBM designers had to make 
sure that it was compatible with the 8088-based original PC. This led to the use of IRQO 
and IRQ1 for the system timer and keyboard, respectively, as was the case in the original 
PC. IBM made the first 8259 a master, and added the second 8259 in slave mode. To do 
that, it connected the INT pin of the slave 8259 to IRQ2 of the master 8259. The master 
and slave 8259s communicate with each other through pins IRQ2, INT, CASO, CAS1, and 
CAS2. See Figure 14-11. Table 14-8 shows the interrupt assignment for x86 PCs. 


INTA 


INTR 


80286 

80386 

80486 
Pentium 


D7 of 
System board Port 70H 
Parity check 
1/0 channel 
parity check 
(expansion slot) 


Figure 14-11. 8259 Chips in Master/Slave Relation for 286 and x86 PCs 


The hardware interrupts on the ISA bus are shown in Figure 14-12. Notice on the 
ISA expansion slot that IRQ10, IRQ11, IRQ12, IRQ14, and IRQ15 are on the 32-pin sec- 
tion and IRQ9, IRQ3, IRQ4, IRQS, IRQ6, and IRQ7 are on the 62-pin section. 


The x86 microprocessor generated interrupts (exceptions) 
As mentioned in Section 14.1, when the CPU encounters an unusual situation 
such as dividing a number by zero, it generates an exception. The 8088/86 had only one 


exception, divide error or INT 0. In the 8088/86, Intel Corporation left the first 32 inter- 
rupts (INT 00 to INT 1FH) reserved for future microprocessors. However, designers of 


—————SC sss... TCTs c cama saa) 
CHAPTER 14: INTERRUPTS IN x86 PC 393 


Table 14-8: Hardware Interrupt Assignment for ISA PCs 


Ra (INT Number 
IRQ1 
IRQ2 INT OAH 

Serial port COM 1 (and COM3 


INT ODH Parallel port 2: LPT2 
INT OEH Floppy disk controller 


INT OFH Parallel port 1: LPT1 


CMOS real-time clock 
Ra INT 71H Software redirected to INT OAH 
IRQ10 Available 

IRQ11 Available 

PS/2 mouse 

Math coprocessor 

IRQ14 Hard disk 

IRQS 


the first IBM PC ignored this and assigned many of these interrupts to hardware and soft- 
ware interrupts on the system. By not adhering to Intel's specifications, IBM has created 
a massive headache for software designers of protected mode 386 and later systems. This 
is due to the fact that Intel continued to assign the processor exception cases generated by 
the x86 CPU to INT 5 and higher with each new member of the x86 family. These are 
shown in Table 14-9. Many of the interrupts in Table 14-9 are used by the x86 in protect- 
ed mode. Since the Microsoft Windows NT/2000/XP and Vista operating systems use the 
x86 in protected mode, they have mapped all these interrupts to new interrupts to avoid 
interrupt conflict with the IRQs of the PC. 


Interrupt priority 


The next topic in this section is the concept of priority for INT 00 to INT FFH. 
What happens if two interrupts want the attention of the CPU at the same time? Which 
has priority? As far as the x86 is concerned, the INTR pin is considered a single interrupt. 
Therefore, the resolution of priority among the IRQs is up to the 8259. Assume that the 
INT instruction (such as INT 21H) and INTR both want to be processed. The INT instruc- 
tion has a higher priority than either INTR or NMI. If both NMI and INTR are activated 
at the same time, NMI is responded to first since NMI has a higher priority than INTR. 
Table 14-10 shows the interrupt processing order for the 80286 microprocessor from 
Intel's manual (1 is the highest priority). 

For the IRQs coming through INTR, the 8259 resolves priority depending on the 
way the 8259 is programmed. In the x86 IBM PC, PS, and compatibles, IRQO has the 
highest priority and IRQ7 is assigned the lowest priority. It must be noted that since IRQ8 
to IRQ15 of the slave 8259 are connected to IRQ2 of the master 8259, they have higher 
priority than IRQ3 to IRQ7 of the master 8259. Figure 14-13 shows the IRQO-IRQI5 pri- 
ority. 

More about edge- and level-triggered interrupts 


As discussed previously, in the 8259 there are two ways to activate the interrupt 
input IRQ, depending upon how it is programmed. One is level-triggered mode and the 
other is edge-triggered mode. 


See SS a 
394 


REAR PANEL 
SIGNAL NAME SIGNAL NAME 


-/O CH CK 


-/(O0 CH RDY 
AEN 
SA19 


SBHE 
LA23 
LA22 
LA21 
LA20 
LA19 
LA18 
LA17 
-MEMR 
-MEMW 
SD08 
SDO09 
SD10 
$D11 
SD12 
$D13 
SD14 


SD15 


COMPONENT 
SIDE 


Figure 14-12. ISA (IBM PC AT) Bus Slot Signals 


CHAPTER 14: INTERRUPTS IN x86 PC 395 


Table 14-9: x86 Microprocessor Interrupt Assignment 


‘interrupt | DOBE/EE 286 386 486 
D dekie 


01 Single step Single step Single step Single step 
debugging 
exceptions 


exceptions 
Nonmaskable 


debugging 
Nonmaskable Nonmaskable Nonmaskable 
interrupt interrupt interrupt interrupt 
04 INTO detected INTO detected INTO detected 
overflow overflow overflow 


Bound range Bound range 


Breakpoint 
INTO detected 


overflow 


Bound range 


exceeded exceeded exceeded 


Coprocessor not Coprocessor not Coprocessor not 
Double exception Double exception Double exception 


Coprocessor Coprocessor Coprocessor 
protection error protection error 
=— Invalid task state Invalid task state Invalid task state 


segment segment segment 


Segment not present |Segment not Segment not 
present present 


J  |Pagetat  [Pagefaut | 
Coprocessor error Coprocessor error__|Coprocessorerror | 


Alignment check 


05 
07 
0A 
oc 
0E 
10 Coprocessor error 


Note: Pentium has assigned a new exception called machine check with INT 12H. 
(Reprinted by permission of Intel Corporation , Copyright Intel Corp. 1990) 


Level-triggered mode 


In level triggered mode, the 8259 will recognize a high on the IRQ input as an 
interrupt request. The request on the IRQ line must remain high until the first INTA is 
acknowledged from the 8259. It is only then that the high must be removed from the IRQ 
input immediately. If the IRQ input remains high after the end of interrupt (EOI) com- 
mand has been issued, the 8259 will generate another interrupt on the same IRQ input. 
Therefore, to avoid multiple interrupt generation, the IRQ input must be brought low 
before the EOI is issued. 


Edge-triggered mode 


In edge-triggered mode, the 8259 will recognize an interrupt request only when a 
low-to-high pulse is applied to an IRQ input. This means that after the low-to-high tran- 
sition on the IRQ input, the 8259 will acknowledge the interrupt request by activating 
INTA and the high level input will not generate further interrupts even after the EOI is 
issued. Therefore, the designer does not need to worry about quickly removing the high 
to avoid generating multiple interrupts, as is the case for level-triggered mode. This is due 
to the fact that in edge-triggered mode, before another interrupt can be requested, the IRQ 


ee ee 
396 


input must be brought Table 14-10: 80286 Interrupt Priority 


back to low. Notice in 
both edge- and level- 
triggered modes that 
the IRQ must stay high 
until after the falling 
edge of the first INTA 
pulse in order to 
acknowledge the inter- 
rupt request. It is inter- 
esting to note the role 
of the IRR (interrupt request 
register) of the 8259. In 
level-triggered mode, the 
IRR latch is always ready to 
recognize a high on the IRQ 
as a request for interrupt. 
But in edge-triggered mode, 
the IRR latch is disabled 
after the request is acknowl- 
edged and will not latch 
another interrupt until that 
IRQ input goes back to low. 
The disadvantage of edge- 
triggered mode is the prob- 
lem of false interrupt caused 
by a good sized spike as a 
result of noise on the IRQ 
line, especially in high- 
speed systems. 


Review Questions 


[Order | 


IRQO 
IRQ1 
IRQ8 
IRQ9 
IRQ10 
IRQ11 
IRQ12 
IRQ13 


IRQ14 
IRQ15 
IRQ3 
IRQ4 
IRQ5 
IRQ6 
IRQ7 


INT instruction or exception 

Single step 

Processor extension segment overrun 
INTR 


HIGHEST PRIORITY 


LOWEST PRIORITY 


Figure 14-13. IRQ Priority in the x86 PC 


True or false. IRQ13 is used for math processor error detection in x86 PCs. 


ee = 


Which portion of the ISA bus has the IRQ10? 

Which has the higher priority, IRQ10 or IRQ7? 

Which has the higher priority, IRQ1 or IRQ13? 

With x86 processors, an invalid instruction causes an exception. Which interrupt is 
assigned to it by Intel? 


PROBLEMS 


SECTION 14.1: 8088/86 INTERRUPTS 


l 


Assume that the 8088/86 is executing an instruction with 17 clock counts. Meanwhile, 
the INTR pin is activated. Does the CPU finish the current instruction before it 
responds to INTR? How does the CPU resume from where it left off? 

Give the logical and physical addresses in the interrupt vector table associated with 
each of the following interrupts. 

(a) INT 5 (b) INT 21H 

What does ISR stand for, and what is it? Give another name for ISR. 

Where is the address of each ISR kept? 

Compare the number of bytes of stack memory used by each of the following. 
(a) CALL FAR (b) interrupt activation 

Vector table addresses 003F8H—003FB belong to which interrupt? 

Give the logical and physical addresses used by the interrupt vector table. 


oo _._ 


CHAPTER 14: INTERRUPTS IN x86 PC 397 


8. How many bytes are used by the interrupt vector table, and why? 

9. Why should we not use the first 1K of address space in 8088/86-based systems? 

10. Indicate the interrupt(s) set aside for exception handling in the 8088/86. 

11. Give the interrupt number (type) assigned to each of the following. 
(a) divide error (b) single step (c) NMI 

12. True or false. When an interrupt through INTR is executed, IF = 0 and TF = 0. 

13. True or false. CLI blocks both INTR and NMI. 

14. Show how to set TF to high. 

15. Show how to set IF to each of the following. (a) low (b) high 

16. True or false. Instruction INTO is executed only if the overflow flag is high 
OF = 1). 

17 a last instruction in the ISR is , whereas the last instruction in a FAR 
subroutine is 

18. What is the difference between RETF and IRET in terms of stack activity? 

19. Show the stack frame where CS, IP, and FR are stored for both an interrupt and a 
CALL FAR routine. Assume that SP = FFEOH. 

20. In which of the following sequences are the stack contents popped off by IRET? 
(anIP. FR, CS (b) FR, IP, CS (c) FR, CS, IP (d) none of the above 


SECTION 14.2: x86 PC AND INTERRUPT ASSIGNMENT 


21. Answer the following questions, assuming that vector table locations 0000:001C to 
0000:001F have the contents indicated below. 
JU 000) See NP 47 FF 00 FO 
(a) Which interrupt does this belong to? 
(b) What is the logical address and physical address of the ISR? 

22. In Problem 21, does BIOS or DOS provide the service? 

23. INT 12H provides the size of which of the following memories? 
(a) high memory area (b) extended memory 
(c) conventional memory (d) expansion memory installed in the expansion slot 

24. If the start of an ISR is located in BIOS ROM at FFE6EH, what are the values of CS 
and IP in the vector table? 

25. In Problem 24, if the ISR belongs to INT 1CH, show the exact contents of the vector 
table. 

26. In what BIOS data area location is the size of conventional memory stored? 


SECTION 14.3: 8259 PROGRAMMABLE INTERRUPT CONTROLLER 
Note: These problems do not necessarily apply to IBM PC compatibles. 


27. True or false. In the 8259, to program ICW1, we must have AO = 1 and CS = 0. 

28. For the 8259, indicate which of the following is input and which is output. 
(a) IR0 IR7 (b) INT (c) INTA 
(d) AO (e) CS (f) RD 

29. Find the addresses for each ICW of the 8259 if CS is activated by A7—Al = 1001 010. 

30. Find the ICW1 and ICW2 if the 8259 is used with an 8088/86, single, edge trigger- 
ing, no ICW4, and IRO is assigned INT 88H. 

31. Show the programming of ICWO and ICW2 in Problem 30. Use the port addresses of 
Problem 29. 

32. Which of the following interrupts cannot be assigned to IRO of the 8259, and why? 
(a)99H  (b) 98H (c) CCH 
(d)22H (e) 10H (f) F8H 

33. Find the INT type number assigned to IRO and IR7 if IR3 is assigned INT 1BH. 

34. Find the INT number assigned to IRO, IR4, and IR6 if IR2 is assigned INT 32H. 

35. Which of the OCWs is used to mask a given IR of the 8259? 

36. EOI is issued by which of the OC Ws? 


ee SSS 


What is the default mode for the prioritization of IRO to IR7? 
. Find the port addresses assigned to each of the OCWs in Problem 29. 
. Show the program to enable IR2 and IR4 and mask the rest of the IRs. Use the port 


addresses in Problem 38. 


. OCW2, OCW3, and ICW1 go to the same port address in the 8259 when AO = 0 and 


CS = 0. How does the 8259 distinguish between them? How does it distinguish 
between OCW2 and OCW3 since both go to the same port address? 


SECTION 14.4: USE OF THE 8259 CHIP IN x86 PCs 


. Why is signal AEN used in accessing the 8259 in the PC? 

. The PC uses the 8259 in (single, cascade) mode. 

. Indicate the IRQs level of triggering in the IBM PC (edge triggered, level triggered). 
. What interrupt numbers are assigned to the 8259 in the PC? 

. What port addresses are assigned to the 8259 in the PC? 

. True or false. IRQO and IRQ] are used by the system board and are not available. 

. Which of the IRQs of the 8259 are available on the expansion slot? 

. Indicate on which side, A or B, of the expansion slot IRQs are located. 

. What are the binary and hex values for the EOI, and to which port is it issued in the 


IBM PC? 


. Which IRQ has the highest priority, and why? 
. True or false. The 8288 chip issues two INTAs to the 8259 when INTR of the 8088 is 


activated. 


. True or false. In the PC, there is more than one source of NMI activation. 
. In the PC, can NMI be blocked? If yes, how? 
. True or false. In the PC, the parity bit error from both the memory of the system board 


and the memory board of the expansion slot can activate NMI. 


SECTION 14.5: MORE ON INTERRUPTS IN x86 PCs 


55: 


True or false. In the x86 PC, INTR of the x86 comes from the primary (master) 8259 
chip. 


. True or false. INTA from the x86 goes to both 8259 chips in the x86 PC. 

. In the x86, what port addresses are assigned to the 8259s? See Appendix E. 

. In the x86 PC, what interrupt numbers are assigned to the second 8259 chip? 

. What IRQs are available on the ISA expansion slot? 

. Why is there no IRQ2 on the ISA expansion slot? Is there any replacement for it? 

. Of IRQ10 and IRQ4, which has the higher priority, and why? 

. True or false. With every generation of the x86, more exception interrupts are added 


but they are downward compatible. 


. True or false. The INT instruction and the exception interrupt have a lower priority 


than NMI. 


. True or false. The NMI has a higher priority than INTR. 
. If NMI, IRQIO, and IRQ6 are all activated at the same time, explain the sequence 


when the system responds and executes them. 


. If IRQ3, IRQ7, and IRQ15 are all activated at the same time, in what order are they 


serviced? 


ANSWERS TO REVIEW QUESTIONS 
SECTION 14.1: 8088/86 INTERRUPTS 


WRWN = 


True 

4 

1K byte beginning at 00000 and ending at 003FFH 
Interrupt service routine (ISR) or interrupt handler 
To hold the CS:IP of each ISR 


ener e ee eee een Le 
CHAPTER 14: INTERRUPTS IN x86 PC 399 


6. 
is 
8. 


00040H, 41H, 42H, and 43H 
No; it is internally embedded into the CPU. 
INT 0 


SECTION 14.2: x86 PC AND INTERRUPT ASSIGNMENT 


Íl 
2i 


oe 


4. 


5; 
6. 


F000:F065, CS = F000 and IP = F065 

INT 10H is assigned memory locations 00040H, 41H, 42H, and 43H in the interrupt 
vector table. That means that we have 00040 = (65), 00041 = (F0), 00042 = (00), and 
00043 = (F0). 

The dash (-) tells us this is the 8th boundary; therefore, the 0000:0038 address in the 
interrupt vector table belongs to INT 14 (OE hex). 

The logical address is FO00:EF57; therefore, CS = F000 and IP = EF57. 

AX = 200 

IRET 


SECTION 14.3: 8259 PROGRAMMABLE INTERRUPT CONTROLLER 


E A a ear 


m a os 


8 

Output, input 

True 

CS =0andA0=0 

False 

The bit D1 in ICW1 indicates if it is for single or cascade. If it is cascade, it expects 
to receive all ICWs, from 1 to 4, but if it is single, it does not expect ICW3. 

True 

True 

IRO 


. Higher priority interrupts IRO, IR1, IR2, IR3, IR4 can come in, but IR6 and IR7 are 


blocked since they have lower priority. 


SECTION 14.4: USE OF THE 8259 CHIP IN x86 PCs 


ES NS A aS pO 


True 

20H and 21H 

Edge triggered 

INT 08 to INT OFH 
True 

IRQ2 through IRQ7 
False 

True 


SECTION 14.5: MORE ON INTERRUPTS IN x86 PCs 


A a 


400 


True 

62-pin section 
IRQ10 

IRQI 

INT 06 


CHAPTER 15 


DIRECT MEMORY ACCESS AND 
DMA CHANNELS IN x86 PC 


OBJECTIVES 


Upon completion of this chapter, you will be able to: 


Describe the concept of DMA, direct memory accessing 

List the pins of the 8237A DMA chip and describe their functions 
Explain how bus arbitration is achieved between DMA and the CPU 
Explain how the channels of the 8237 are used in the PC 

List the DMA signals in the ISA bus 


For a computer to work efficiently, there must be a way to transfer a large amount 
of data in a short amount of time. In the IBM PC, this is accomplished with the help of 
what is called direct memory access (DMA), and that is the subject of this chapter. 


SECTION 15.1: CONCEPT OF DMA 


In computers there is often a need to transfer a large number of bytes of data 
between memory and peripherals such as disk drives. In such cases, using the micro- 
processor to transfer the data is too slow since the data first must be fetched into the CPU 
and then sent to its destination. In addition, the process of fetching and decoding the 
instructions themselves adds to the overhead. For this reason, Intel created the 8237 
DMAC (direct memory access controller) chip, whose function is to bypass the CPU and 
provide a direct connection between peripherals and memory, thus transferring the data as 
fast as possible. While the 8237 can transfer a byte of data between an I/O peripheral and 
memory in only 4 clocks, the 8088 would take 39 clocks: 


Number of Clocks 
BACK: MOV AL,[SI] 10 
OUT PORT,AL 10 
INC SI 2 
LOOP BACK 17 
;total clocks 39 


One problem with using DMA is that there is only one set of buses (one set of each 
bus: data bus, address bus, control bus) in a given computer and no bus can serve two mas- 
ters at the same time. The buses can be used either by the main x86 CPU or the 8237 
DMA. Since the x86 has primary control over the buses, it must give permission to DMA 
to use them. How is this done? The answer is that any time the DMA needs to use the buses 
to transfer data, it sends a signal called HOLD to the CPU and the CPU will respond by 
sending back the signal HLDA (hold acknowledge) to indicate to the DMA that it can go 
ahead and use the buses. While the DMA is using the buses to transfer data, the CPU is 
sitting idle, and conversely, when the CPU is using the bus, the DMA is sitting idle. After 
DMA finishes its job it will make HOLD go low and then the CPU will regain control over 
the buses. See Figure 15-1. 

For example, if the DMA is to transfer a block of data from memory to an I/O 
device such as a disk, it must know the address of the beginning of the block (address of 
the first byte of data) and the number of bytes (count) it needs to transfer. Then it will go 
through the following steps. 


1. The peripheral device (such as the disk controller) will request the service of DMA 
by pulling DREQ (DMA request) high. 

2. The DMA will put a high on its HRQ (hold request), signaling the CPU through its 
HOLD pin that it needs to use the buses. 

3. The CPU will finish the present bus cycle and respond to the DMA request by put- 
ting high on its HLDA (hold acknowledge), thus telling the 8237 DMA that it can 
go ahead and use the buses to perform its task. HOLD must remain active high as 
long as DMA is performing its task. 

4. DMA will activate DACK (DMA acknowledge), which tells the peripheral device 
that it will start to transfer the data. 

5. DMA starts to transfer the data from memory to the peripheral by putting the 
address of the first byte of the block on the address bus and activating MEMR, 
thereby reading the byte from memory into the data bus; it then activates IOW to 
write the data to the peripheral. Then DMA decrements the counter and increments 
the address pointer and repeats this process until the count reaches zero and the task 


———— 
402 


is finished. 


6. After the DMA has finished its job it will deactivate HRQ, signaling the CPU that 
it can regain control over its buses. 


This above discussion indicates that DMA can only transfer information; unlike 
the CPU, it cannot decode and execute instructions. Notice also that when the CPU 
receives a HOLD request from DMA, it finishes the present bus cycle (but not necessari- 
ly the present instruction) before it hands over control of the buses to the DMA. This is in 
contrast to a hardware interrupt, in which the CPU finishes the present instruction before 
it responds with INTA. One could look at the DMA as a kind of CPU without the instruc- 
tion decoder/executer logic circuitry. For the DMA to be able to transfer data it is equipped 
with the address bus, data bus, and control bus signals IOR, IOW, MEMR, and MEMW. 


Data Bus 


Disk 
Controller 


Address Bus 


~ Control Bus (IOR, IOW, MEMR, MEMW) 


Figure 15-1. DMA Usage of System Bus 


Review Questions 


1. True or false. When the DMA is working, the CPU is sitting idle. 

2. True or false. When the CPU is working, the DMA is sitting idle. 

3. True or false. No bus can serve two masters at the same time. 

4. True or false. The main CPU (x86) has control over all the system buses. 
5 


To get control over the system bus the (INTR, HOLD) pin of the x86 is 
activated. 

6. The x86 CPU informs the peripheral that it relinquishes control over the system bus 
through its pin. 

7. The HOLD is an (input, output) for the x86 CPU. 

8. The HLDA is an (input, output) for the x86 CPU. 


a 
CHAPTER 15: DIRECT MEMORY ACCESS AND DMA CHANNELS IN x86 403 


SECTION 15.2: 8237 DMA CHIP PROGRAMMING 


The Intel 8237 DMA controller is a 40-pin chip. It has four channels for transfer- 
ring data, and each must be used for one device. For example, one is used for the floppy 
disk, one for the hard disk, and so on. Of course, only one device can use the DMA to 
transfer data at a given time. With every channel there are two associated signals, DREQ 
(DMA request) and DACK (DMA acknowledge). DREQ is an input to DMA coming from 
the peripheral device (such as the hard disk controller) and DACK is an output signal from 
the 8237 going to the peripheral device. From the 8237 DMA, there is only one HOLD 
and one HLDA that are connected to HOLD and HLDA of the x86. This means that four 
channels from four different devices can request use of the system buses, but DMA decides 
who gets control based on the way its priority register has been programmed. Every chan- 
nel of the 8237 DMA must be initialized separately for the address of the data block and 
the count (the size of the block) before it can be used. This initialization involves writing 
into each channel: 


1. The address of the first byte of the block of data that must be transferred (called the 
base address). 
2. The number of bytes to be transferred (called the word count). 


After initialization, each channel can be enabled and controlled with the use of a 
control word. There are many modes of operation, and these various modes and options 
must be programmed into the 8237's internal registers. To access these registers, the 8237 
provides four address pins, AO—A3, along with the CS (chip select) pin. Since each chan- 
nel needs separate addresses for the base address and the word count, a total of eight ports 
is set aside for those alone. Table 15-1 shows the internal addresses of the 8237 registers 
for each channel. Example 15-1 shows how these addresses are generated. 


Example 15-1 


Find the port addresses for the base address and word count of each channel of the 8237 for Figure 
15-2 (CS is activated by A7—-A4 = 1001 binary). 


Solution: 


From Table 15-1, one can get the addresses found in Table 15-3. 


Figure 15-2. Diagram for Example 15-1 


404 


Table 15-1: 8237 Internal Addresses for Writing Transfer Addresses and Counts 


H CS IOR IOW A3 A2 A1 AO 
Base and current address Write 


Current address Read 
Base and current word count Write 
Read 
Base and current address Write 


i Current address | Read 
ao Base and current word count Write 


Current word count Read 
Base and current address Write 


(j=) 
= 
© 
=] 


Current word count 


=> l a jos ic lie lOTO |= = = o fe Io 


s1 ees eee e iS eS 
=- |© j- JO |j- |o J] oj- |o 


Current address Read 


bad 

|_| Base and current word count | write 
[| Currentword cout Rea 
Write 
E Current address Read 
|_| Base and current word count | write 
|_| Currentword count Read 


(Reprinted by permission of Intel Corporation , Copyright Intel Corp. 1983) 


=- [O j> JO J] JO j- JO j- JO j> JO JA] JO JH] |o 
Oo |O tone (Oo hO /O WO (Oro [ove So ie ao 


: 


(2) k=) 
O j- |o 


= ja ja j a l e a olo lo Te lolo lo 


oO 
O |= 


Table 15-2: 8237 Internal Addresses for Commands/Status 


tion 
1| o | o | o | o | 1 |Readstatusregister O Cd 
1| o| o | o| 1 | o |wrtecommandregister = 
"o | e E ee 


Write request register 

lllegal 

Write single mask register bit 
illegal 

Write mode register 


Clear byte pointer flip -flop 


Read temporary register 
Master clear 
Illegal 


Clear mask register 


Write all mask register bits 


(Reprinted by permission of Intel Corporation , Copyright Intel Corp. 1983) 


CHAPTER 15: DIRECT MEMORY ACCESS AND DMA CHANNELS IN x86 405 


Table 15-3: 8237 Address Selection for Example 15-1 


Hex 
Address 
1 


Binary address 


A7 A6 A5 A4 3 A2 A1 A0 


Read/ 


> 


a 

[90 [CHANO memory address register [RW 

[91 [CHANOcountregister RW 

[92 [CHAN memory address register [RW 
IW 


9 
9 
9 


=> 
(ey ke) 


Oo }o |o 
| O GNS S 
= |j |= j |m |a j | 

© io fo 0O S 

-Ú | 
= ha O O l= o e 
- O || oO le |o j- o 


i 
E 


2 
CHAN1 count register 
4 CHAN2 memory address register 
7 


i 


CHAN2 count register R/W 
‘ee | CHAN3 memory address register R/W 
CHANS count register 


The two sets of information needed in order to program a channel of the 8237 
DMA to transfer data are (1) the address of the first byte of data to be transferred, and (2) 
how many bytes of data are to be transferred. 

For set 1, the channel's memory address register must be programmed. Since the 
memory address register of the 8237 is 16 bits and the data bus of the 8237 is 8 bits, one 
byte at a time, consecutively, is sent in to the same port address. For set 2, the channel's 
count register is programmed. The count can go as high as FFFFH. Since the count regis- 
ter is 16 bits and the data bus of the DMA is only 8 bits, it takes two consecutive writes to 
program that register. This is shown in Example 15-2. 


Example 15-2 


2) 
= 


Assume that channel 2 of the DMA in Example 15-1 is to transfer a 2K (2048) byte block of data 
from memory locations starting at 53400H. Use the port addresses of Example 15-1 for the DMA 
to program the memory address register and count register of channel 2. 


Solution: 


The port addresses for the channel 2 memory address register and count register in Example 15-1 
are 94H and 95H, respectively. The initialization will look as follows: 


MOV AX, 3400H ;load lower 4 digits of start address 
OUT 94H,AL ¡send out the low byte of the address 
MOV AL, AH ; 

OUT 94H, AL ¿send out the high byte of the address 
MOV AX, 2048 load block sike into AX 

OUT 95H, AL ;send out the low byte of the count 
MOV  AL,AH ; 


OUT 95H,AL ;send out the high byte of the count 


The contents of the memory address and count registers can be read in the same manner (low byte 
first, then high byte) to monitor these registers at any time. From looking at the above program one 
might ask, since the system address bus is 20 bits and the memory address is 53400H, why does this 
program use 16-bit addresses? This is a limitation of the 8237 DMA. In the 8237, not only is the 
register holding the address of the block 16 bits, but in addition there are only 16 address pins that 
carry the addresses. The x86 PC solves this problem by using an external n-bit register to hold the 
upper bits of the addresses. 


406 


8237's internal contro! registers 


Although the 8237 has four channels and each channel must be programmed sep- 
arately for the base address and count, there is only one set of control/command registers 
used by all channels. These registers are shown in Table 15-1. To understand how to access 
those registers, look at Example 15-3. 

Of these eight registers, only the most essential ones will be explained in detail 
here. The reader can refer to Intel manuals for information concerning others. The func- 
tions of 8237 pins are described in Section 15.4 in the context of some real-life designs, 
such as the x86 PC. 


Use the circuit in Example 15-1 to find the address of the 8237 DMA control registers. 


Solution: 


Using Table 15-2 and substituting for A7—A3 gives the information in Table 15-4. 


Table 15-4: Address Selection for Example 15-3 


Binary address 


A7 A6 A5 A4 


Read/ 


> 


3 A2 A1 A0 Register Name 


Status/command register 
Request register 


> 


Single mask register bit 


T i 


= | |a |j 


Mode register 
Clear byte pointer 
Master clear/temporary register 


J5 


—- jo |o jo jo 


© 
(G) 


i 
KO 
m 


gge 
ii 


Clear mask register 


= 
ororo roS 
O IO OTG 
Sa S a S S 
ERTES 
eae) e 
= je oO oO |= |= o jo 
= jO e oO j= o |= oOo 
Ko] 
(oj 


Mask register bits 


at, 
= 


Ie 


Command register 


This is an 8-bit register used for controlling the operation of the 8237 (see Figure 
15-3). It must be programmed (written into) by the CPU. It is cleared by the RESET sig- 
nal from the CPU or the master clear instruction of the DMA. The function of each bit is 
described below. 

The 8237 is capable of transferring data (1) from a peripheral device to memory 
(reading from disk), (2) from memory to a peripheral device (writing the file into disk), or 
(3) from memory to memory. One example of the use of the memory-to-memory option is 
what is called shadow RAM. In computers such as 386- and 486-based systems, the access 
time of ROM is too long. However, the system can copy the ROM into a portion of RAM 
and allow the CPU to access it from RAM, which has a much shorter access time than 
ROM. 

DO gives the option to use only channels 0 and 1 for transferring a block of data 
from memory to memory. Why the need for two channels? Channel 0 must be used for the 
source and channel | for the destination. Channel 0 reads the byte into a temporary regis- 
ter inside the 8237, and then channel 1 will write it to the destination. This is in contrast 
to I/O-to-memory or memory-to-I/O transfers, in which the data is read into the data bus 
and transferred to the destination, all without being saved anywhere temporarily. 

D1 is used only when the memory-to-memory option is enabled and can be used 
to disable the memory incrementation/decrementation of channel 0 in order to write a 
fixed value into a block of memory. 

D2 is used to enable or disable DMA. 

D3 gives the option to choose between the normal memory cycle of 4 clock puls- 


Ne 
CHAPTER 15: DIRECT MEMORY ACCESS AND DMA CHANNELS IN x86 407 


es and compressed timing of 2 clock pulses per memory cycle. There are 4 clock pulses 
per byte after the initial delay, assuming that the high byte address is already latched. If 
every byte of transfer requires both high byte and low byte addresses, an extra clock pulse 
for the address latch is required, which makes the bus cycle 5 clock pulses. The same is 
the case for the compressed option, making it 3 clock ticks per bus cycle. 

D4 gives the option of using the four channels on fixed priority or rotating prior- 
ity. If fixed priority is chosen, DREQO has the highest priority and DREQ3 has the lowest 
priority. If more than one DREQ is activated at the same time, it will always respond to 
the one with the highest priority. In rotating mode, DREQO again has the highest priority 
and DREQ3 the lowest, but the system rotates through DREQO, DREQ1, DREQ2, and 
DREQ3 in that order, servicing one request from each if present. In other words, when 
DREQO is served it will not be given a chance until the rest of the DREQs are given a 
chance. This prevents monopolization by the DREQ with the highest priority. 

D5 allows time for the write signal to be extended for slow devices. 

D6 gives the option of programming the activation level of DREQ. It can be an 
active-high or active-low signal. A 

D7 gives the option of programming the activation level of DACK. It can be an 
active-high or active-low signal. 

The command byte is issued to this register through port address X8H, where X is 
the combination provided to activate CS, as shown in Examples 15-4 and 15-5. 5 


Example 15-4 


Program the command register of the 8237 in Example 15-3 for the following options: no memory- 
to-memory transfer, normal timing, fixed priority, late write, and DREQ and DACK both active- 
high. 


Solution: 


From Figure 15-3, the command byte would be 1000 0000 = 80H and the program is 
MOV AL, 80H ;load the command byte into AL 
OUT 98H, AL ;issue the command byte to port 98H 


Example 15-5 


Assume that the CPU is doing some very critical processing and that the 8237 DMA should be 
disabled. Use the ports in Example 15-3 to show the program. 


Solution: 


To disable the 8237, send 0000 0100 = 04H to the command register as follows: 


MOV AL, 04H 
OUT 98H,AL 


Status register 


This is an 8-bit register that can only be read by the CPU through the same port 
address as the command register. This register is often referred to as RO (read only) in PC 
documentation. As mentioned above, the port is X8 hex, where X is for CS. It contains var- 
ious information about the operating state of the four channels. The lower four bits, DO 
D3, are used to indicate if channels 0-3 have reached their TC (terminal count). TC is set 
high when the count register has been decremented to zero. This gives the option to mon- 
itor the count register by software. This monitoring also can be done by hardware through 
the EOP pin of the 8237, as we will see in the next section. The upper four bits, D4-D7, 
of the status register keep count of pending DMA requests. This information can be used 


aa 
408 


by the CPU to see which channel has a pending DMA request. See Figure 15-4. 
Mode register 
This register can only be written to by the CPU through port address XBH, where 


X is the address combination for CS activation. Of the 8 bits of the mode register, the 
lower two, DO and D1, are used for channel selection. The other 6 bits are used to select 


por [os [os [oe o | oo 


0 = Memory-to-memory disable 
1 = Memory-to-memory enable 


0 = Channel 0 address hold disable 
1 = Channel 0 address hold enable 
x = if bit DO =0 


0 = Controller enable 
1 = Controller disable 


0 = Normal timing 


1 = Compressed timing 
x= if bitO=1 


0 = Fixed priority 
1 = Rotating priority 


0 = Late write selection 
1 = Extended write selection 
xX = if bit 3 =1 


0 = DREQ sense active high 
1 = DREQ sense active low 


0 = DACK sense active low 
1 = DACK sense active high 


Figure 15-3. 8237 Command Register Format 


por [pe [os [oe [os [ee [ot [00 | 


1 =Channel 2 has reached TC 


1 =Channel 3 has reached TC 


1 =Channel 0 request 


1 =Channel 1 request 


1 =Channel 2 request 


1 =Channel 3 request 


Figure 15-4. 8237 Status Register Format 
je 
CHAPTER 15: DIRECT MEMORY ACCESS AND DMA CHANNELS IN x86 409 


various operation modes to be used for the channel selected by bits DO and D1. D2 and D3 
specify data transfer mode. In the write transfer option, the DMA transfers from an I/O 
device (such as a disk) to memory by activating IOR and MEMW. Reading from memory 
to an I/O is a read transfer and is achieved by activating MEMR and IOW. The verify 
transfer is called pseudo and is like a read or write except that it does not generate any con- 
trol signals, such as IOR, MEMR, and so on. D4 is used for autoinitialization. If enabled, 
the memory address register and the count register are reloaded with their original values 
at the end of a DMA data transfer (when the count register becomes zero). In this way 
those registers are programmed only once and the original values are saved internally. D5 
gives the option to increment or decrement the address. D6 and D7 determine the way the 
8237 is used. The options are: 


1. Demand mode, where the transfer of data continues until DREQ is deactivated or 
the terminal count has been reached. This ensures that the DMA can finish the job 
without interruption even though it means monopolization of the system buses by 
the DMA for the duration of the transfer of the entire block of data. ; 

2. Block mode, which is the same as demand mode except that DREQ can be deacti- 
vated after the DMA cycle starts and the process of data transfer will go on until the 
TC (terminal count) state has been reached. In other words, there is no need to keep 
the DREQ high for the duration of the data transfer. i 

3. Single mode, where if DREQ is held active, the DMA transfers one byte of data, 
then allows the x86 to gain control of the system bus by deactivating its HRQ for 
one bus cycle. This process goes on alternating access to the system bus between 
the CPU and DMA until the TC has been reached, and then autoinitialization will 
happen if that choice has been made in the control word. This is the option used in 
all x86 PCs and compatibles since the DMA and CPU alternately share the system 
buses, allowing both to do their job without either monopolizing the buses. 

4. Cascade mode, in which several DMAs can be cascaded to expand the number of 


00 = Channel 0 select 
01 = Channel 1 select 
10 = Channel 2 select 
11 = Channel 3 select 


00 = Verify transfer 

01 = Write transfer 

10 = Read transfer 

11 = Illegal 

xx = if bits 6 and 7 = 11 


Figure 15-5. Mode Register Format 


0 = Autoinitialization disable 
1 = Autoinitialization enable 


0 = Address increment select 
1 = Address decrement select 


00 = Demand mode select 
01 = Single mode select 
10 = Block mode select 
11 = Cascade mode select 


e 


410 


DREQs to more than 4. This option is used in x86 PC, as we will see in Section 
15.5. The original PC used only one 8237. Examples 15-6 and 15-7 show the pro- 
gramming of the mode register. 


Single mask register 


This register can only be written to by the CPU through port address XA hex, 
where X is for CS. Of the 8 bits of this register, only three are used. DO and D1 select the 
channel. D2 clears or sets the mask bit for that channel. It is through this register that the 
DREQ input of a specific channel can be masked (disabled) or unmasked (enabled). For 
example, if the value 00000101 is written to this register, it will mask (block) DREQ1 and 
the DMA will not respond to DREQ of channel 1 when DREQ! is activated. While the 
command register can be used to disable the whole DMA chip, this register allows the pro- 
grammer to disable or enable a specific channel. The only problem is that only one chan- 
nel can be masked or unmasked at a time. To mask or unmask more than one channel, the 
all mask register is used. Figure 15-6 shows the single mask register format. 


00 = Select Channel 0 mask bit 
01 = Select Channel 1 mask bit 


10 = Select Channel 2 mask bit 


11 = Select Channel 3 mask bit 


0 = Clear mask bit 
1 = Set mask bit 


Figure 15-6. 8237 Single Mask Register Format 


All mask register 


In function, this register is similar to the single mask register except that all 4 
channels can be masked or unmasked with one write operation. For example, if 00000010 
is written to this register, it will mask the OUT of channel 1 and unmask (enable) the other 
channels. See Figure 15-7. Again this register can only be written to by the CPU through 
the port address XFH, where X is for CS activation. 


0 = Clear Channel 0 mask bit 
1 = Set Channel 0 mask bit 


0 = Clear Channel 1 mask bit 
1 = Set Channel 1 mask bit 


0 = Clear Channel 2 mask bit 
1 = Set Channel 2 mask bit 


0 = Clear Channel 3 mask bit 
1 = Set Channel 3 mask bit 


Figure 15-7. 8237 All Mask Register Format 


na ag RR SSS SS 
CHAPTER 15: DIRECT MEMORY ACCESS AND DMA CHANNELS IN x86 411 


Example 15-6 


Program the 8237's mode register of Example 15-3 to select channel 2 to transfer from memory to 
I/O using autoinitialization, address increment, and single-byte transfer. 

Solution: 

From Figure 15-5, with these options, the mode register must have 01011010 = SAH. The port 


address for the command register is 9BH, which results in 
MOV AL,5AH 
OUT! 9BH,AL 


Example 15-7 


Program the 8237 of Example 15-3 to enable channel 2. 
Solution: 
From Figure 15-6, the value for the single mask register to enable (unmask) channel i is 0000 0010 


= 02H and is sent to port 9AH as follows: 
MOV AL,02 


OUT 9AH, AL 


Master clear/temporary register 


This register must only be written to by the CPU through port address XDH, 
where X is for CS activation. The byte sent to this register does not matter since it simply 
clears the status, command, request, and mask registers and forces the DMA to the idle 
cycle. This is the same as activating the hardware RESET of the 8237. If an attempt is 
made to read from this register, the DMA will provide the last byte of data that was trans- 
ferred during the niemory-to-memory transfer. Note that when the DMA is doing an I/O- 
to-memory or memory-to-I/O transfer, it transfers the data directly between these two sec- 
tions of the computer without bringing it into the DMA, but in memory-to-memory trans- 
fers it must bring each byte into the DMA before it sends it to the destination, since it has 
to switch the contents of the address bus for the source and destination. This is similar to 
string instruction MOVSB, except that it is performed by the DMA instead of the CPU. 


Clear mask register 


This register can be written to by the CPU only through port address XEH, where 
X is for CS. The bit patterns written to it do not matter. Its function is to clear the mask 
bits of all four channels, thereby enabling them to accept the DMA request through the 
DREQs. 


Review Questions 


1. Which address bits are used to select a register inside the 8237? 

2. For an 8237, why are addresses XOH to XFH used to access its internal registers? 

3. True or false. To use a channel to transfer data, both the memory address register 
and count register for that channel must be programmed. 

4. State the functions of the memory address register and the count register. 

5. Show instructions to program channel 0 memory address and count registers to 
transfer 4K starting from offset 1440H. Use port addresses from Example 15-1. 

6. In the fixed priority scheme, which channel has the highest priority? 

7. True or false. Programming some control registers is optional (depending on how 
the 8237 is used), but the command register must always be programmed. 

8. True or false. The command register is accessed by CS = 0 and A3—A0 = 1000. 

. True or false. The mode register is accessed by CS = 0 and A3-A0 = 1011. 

10. ‘True or false. The level of activation, high or low, for DREQ and DACK of each 

channel can be programmed. 


Se 
412 


SECTION 15.3: 8237 DMA INTERFACING IN THE IBM PC 


As shown in Figure 15-8, 
the 8237 DMA has eight addresses, 
AO-A7. Four of these, A0—A3, 
form a bidirectional address bus, 
sending addresses into the 8237 to 
select one of the 16 possible regis- 
ters, assuming that chip select is 
activated. In the IBM PC, chip 
select is activated by YO of the 
74LS138 as shown in Figure 15-9. 
The address selection of the regis- 
ters inside the 8237 is summarized 
as shown in Table 15-5, assuming 
zero for each x. The conditions for 
A6, A7, A8, A9, and AEN were dis- 
cussed in Chapter 12 and will not 
be repeated here. From Table 15-5 
it can be seen that port addresses 0 
to 7 are assigned to the four chan- 
nels, and 08—0F are assigned to the 
control registers commonly used by 
all the channels. 


8237 and 80848 connections in 
the IBM PC 


Since the DMA must be 
capable of transferring data 
between I/O and memory without 
any interference from the CPU, it Figure 15-8. 8237A DMA Pin Layout 
must have all the required control, data, 
and address buses. Looking at Figure 15-10, one can see that the 8237 has its own data 
bus, DO—D7. This is a bidirectional bus connected to the system bus DO—D7. It also has 
all four control buses, IOR, IOW, MEMR, and MEMW. However, its address bus, AO—A7, 
is only 8 bits. If the 8237 can transfer up to 64K bytes of data between I/O and memory, 
it must have 16 address lines, AO-A15. Where are the other eight address pins, A8—A15? 
The answer is that the high byte of the 16-bit address changes only once, while DO—D7 are 


1 
2 
3 
4 
5 
6 
1 
8 
9 


Figure 15-9. Chip Selection of the 8237A in the PC 


CHAPTER 15: DIRECT MEMORY ACCESS AND DMA CHANNELS IN x86 413 


used by the 8237 to send out the upper 
part of the address whenever A0—A7 rolls 


over from FF to 00. There must be a Microprocessor | DMA handshake 
device to latch and hold the A8—A15 part interface signals 
of the address from the DO—D7 data bus. A0-A3 aie 
This is the function of the 74LS373. The A4-A7 

function of the ADSTB (address strobe) DBO — DB7 

is to activate the latch whenever the 8237 ADSTB 

provides the upper 8-bit address through AEN 

the data bus. Similar to ALE, the ADSTB MEMR 

goes high only when D0-D7 are used to MEMW 

provide the upper address, meaning that IOR 

as long as ADSTB stays low, D0-D7 are IOW 

a normal data bus. Figure 15-11 diagrams READY 

the 8237 circuit connection. RESET 


It should be noted that AO—A3 CLK ° 
and all control buses are bidirectional, so 
that when programming the 8237 they 
can be used to communicate with the 
internal registers. As long as the CPU is 
idle and the DMA is in control of trans- 
ferring data, AO—A7 and all the control 
buses are unidirectional. 


Figure 15-10. Block Diagram of the 8237A DMA 


One last point about AO—-A15 from the 8237 is that the system bus can be used by 
the CPU only when the 8237 is not functioning. This is ensured through the AEN signal. 
This signal was discussed in Chapter 9 and is summarized here. 


AEN 
0 x86 is in control of the system bus 
l 8237 DMA is in control of the system bus 


The rest of the pins in Figure 15-10 are described below. 


Table 15-5: PC8237 Internal Register Port Addresses 


Hex 


A7 A6 A5 A4 A3 A2 A1 AO Address 


Function 


00 0 x ev a al 

7 
Dee forre Tes ne eae ee 
loo ox [1000 [oe | Statusoommand egitae dR 
bot age [nee ee 
ooox fror: [o [Moteredister dW 

T 
ooox [1101 [op [mester cleartemporaryreiter [Rw 

7 
me 


414 


from 


buffered AEN =1 
address DMA in control 


A3 AO 
8237 
RESET RESET AO AO 
from decoder acs A3 
from RDY to DMA RDY A4 
DMA clock CLK A7 A7 
HLDA HLDA 74LS244 
IOR 5V 
to and from 
{OW —— 
buffered HRQ yo > HRQ DMA 
control buses MEMR 5V 
MEMW EOP +4245 — Terminal count to exp 
from counter 1 
i DREQO 
of 8253 Timer Be DACKO 
from det DACKO Ð © DACKO BRD 
expansion í DREQ2 
slot DREQ3 DACK1 O DACK1 
DO DO DACK2 oma DACK2 to expansion slots 
t= DACK3 ~-49 DACK3 
D7 D7 
DO 
A16 
A17 
A18 
A19 
DMA AEN © 


Write DMA page reg 
port 80H 


74LS670 


Figure 15-11. 8237 DMA Circuit Connection in the PC 
RESET is the input coming from the RESET of 8284. 


CS is from the 74LS138 decoder, as shown earlier. 
READY input is from the RDYDMA of the wait state generation circuitry. The 
purpose of this is to extend the memory cycle of the DMA. 


maaan ee eee errrr errr rererreerrrrreerree rene 


CHAPTER 15: DIRECT MEMORY ACCESS AND DMA CHANNELS IN x86 415 


HOLD and HLDA are connected to the pins with the same name on the x86 CPU. 


EOP (end of process) is inverted and becomes TC (terminal count). This signal is 
activated whenever the count register of any of the four channels is decremented to zero. 
This signal could be used with the DACK of a specific channel to prevent multiple DMA 
requests from that channel at the same time or could be used to inform the requesting 
device that the DMA has finished the job and it should deactivate its DREQ. In other 
words, EOP is a hardware pin indicating that the counter has reached zero. Using software 
one can monitor the count register of each channel by reading the status register, as was 
shown in Section 15.2. 


DREQO and DACKO are the signals for channel 0 and are used for refreshing 
DRAM as explained in Section 15.4. While the DREQ is active high, the DACKO is pro- 
grammed to be active low by BIOS, as shown in Section 15.4. 


DREQ1-DREQ3 and DACK1-DACK3 are the signals for channel 1 to channel 
3, and are available through the expansion slot. The assignment of these channels is dis- 
cussed next. 


Channel assignment of the 8237 in the IBM PC 


In the original IBM PC, each of the four channels of the 8237 is assigned in the 
following fashion. 
1. Channel 0 for refreshing DRAM. In the later x86 PC this practice was abandoned. 
2. Channel 1 is unused, but in many implementations it is used for networks. 
3. Channel 2 usually is used for the floppy disk controller. 
4. Channel 3 normally is used for the hard disk controller. 

Inspecting iBM BIOS shows that 8237 channels 1, 2, and 3 have been initialized 
by programming the mode register. The mode register, which must be sent to port address 
OBH, is as follows for channel | (from Figure 15-5): 


D1,D0 =01 for channel 1 
D3,D2 = 00 for verify transfer 
D4 =0 autoinitialization disable 
D5 = 0 for address increment 
D7,D6 = 01 for single byte mode 


D7 DO 
0100 0001 = 41H mode register for channel 1 


For channels 2 and 3, the value for the mode register is the same except that DO 
and Di are changed to 10 and 11, respectively. Therefore, channels 2 and 3 have mode reg- 
ister values of 42 and 43. The program could look like the following code. 


MOV DX, 000BH ;load the mode register address 
MOV "AL, 41A ;chan1 mode reg value 

OUT DxX,AL 

MOV AL, 42H ;chan2 mode reg value 

OUT DxX,AL 

MOV AL, 43H 7;Chan3 mode reg value 

OUT Dx Al 


The way IBM BIOS does the initialization is slightly more compact: 
474 MOV DL,0BH ; DX=000B 


E136 B20B 

E142 B103 481 MOV cime 

E144 B041 482 MOV AL, 41H ; SET MODE FOR CHANNEL 1 
E146 483 C18A: 


—$—<————— eee 


416 


E146 EE 484 OUT DX, AL 

E147 FECO 485 INC AL 7POINT TO NEXT CHANNEL 
E149 E2FB 486 LOOP C18A 

(Reprinted by permission from "IBM BIOS Technical Reference" c. 1984 by International Business Machines Corp.) 


! These channels are programmed by the device that uses them when the device is 
installed. For example, the hard disk controller ROM programs channel 3 according to its 
specifications. 


Review Questions 


1. What port addresses are assigned to the 8237 in the original PC? 

2. What port address is assigned to the command register of the 8237 in the original 
PC? 

If the 8237 has only AO—A7, how is the 16-bit address A0-A15 provided? 

4. The 8237 in the original IBM PC is programmed to have channel _as the highest 


ios) 


priority and channel _as the lowest priority. 


SECTION 15.4: DMA IN x86 PCs 


The original IBM PC had only three DMA channels available through the expan- 
sion slot. All these channels were designed for 8-bit data transfer. To expand the capabili- 
ty of the PC, designers of the x86 PC added the second 8237 and made it a 16-bit data 
transfer DMA. This is shown in Figure 15-12. 


8237 #1 

pes k 36-pin slot 
DRQ1 

DACK1 

DRC 62-pin slot 


80286 8237 #2 DRQ3 
DACK3 


HLDRQ 
HOLDA 


DRQS 
DACKS 
— m Aae 36-pin slot 
DRQ7 
DACK7 


Figure 15-12. 80286 (and Higher) PCDMA 


8237 DMA #1 


To maintain compatibility with the original PC, DREQ1, DREQ2, and DREQ3 of 
DMA #1 are available through the expansion slot and are for 8-bit data transfer between 
8-bit I/O and the 16 MB memory range of the x86. The ports assigned to DMA#!1 are 
exactly the same as in the PC. The x86 PCs abandoned the idea of refreshing DRAM using 
DMA channel 0 and instead replaced it with DRAM refresher circuitry. This made chan- 
nel 0 available through the ISA expansion slot. The signal associated with channel 0 is 
DREQO and DACKO and is accessed though the 36 edge of the ISA. This is shown in 


Figure 15-13. 


e eee eee ——_—_— ——————— 
CHAPTER 15: DIRECT MEMORY ACCESS AND DMA CHANNELS IN x86 417 


REAR PANEL 
SIGNAL NAME SIGNAL NAME 


-/O CH CK 
SD7 
SD6 
SD5 
SD4 
SD3 
SD2 
SD1 
SDO 
-/O CH RDY 
AEN 
SA19 
SA18 
SA17 
SA16 
SA15 
SA14 
SA13 
SA12 
SA11 
SA10 
SA9 
SA8 
SA7 
SA6 
SA5 
SA4 
SA3 
SA2 
SA1 
SAO 


1 


SBHE 
LA23 
LA22 
LA21 
LA20 
LA19 
LA18 
LA17 
-MEMR 
-MEMW 
SD08 
SDO9 
$D10 
SD11 
SD12 
SD13 
SD14 
SD15 


COMPONENT 
SIDE 


Figure 15-13. ISA (IBM PC) Bus Slot Signals 


aaaea a 
418 


The following points must be noted regarding channels 0, 1, 2, and 3 of DMA #1 
in x86 PC computers. 


1. Channels 0, 1, 2, and 3 can be used only for data transfer between 8-bit I/O and sys- 
tem memory. The system memory address can be on an odd-byte or even-byte 
boundary. 

2. Since the count register is a 16-bit register, each of channels 0, 1, 2, and 3 can 
transfer up to a 64K-byte block of data. 

3. Each channel, 0, 1, 2, or 3, can transfer data in 64K-byte blocks throughout the 
16M system memory address space. 


8237 DMA #2 


The second 8237 DMA is connected as master (level 1) and its channel 0 is used 
for cascading of DMA#1 as shown in Figure 15-12. The other three channels of this DMA 
are available through the expansion slot (36 edge) under DREQS5 and DACK5, DREQ6 
and DACK6, and DREQ7 and DACK7 designations. These three channels must be used 
for 16-bit data transfer. 


Points to be noted regarding 16-bit DMA channels 


Channels 5, 6, and 7 of DMA #2 are used exclusively for 16-bit data transfer 
between the 16 MB memory address space and I/O peripherals. The following points must 
be noted regarding their use. 


1. Channels 5, 6, and 7 must be used for 16-bit data transfers between 16-bit system 
memory and 16-bit I/O adapters. Notice that the I/O must support 16-bit data. 

2. The number of 16-bit (2-byte) words to be transferred is programmed into the count 
register of channels 5, 6, and 7. Since the count register is a 16-bit register, each 
channel can transfer up to 65,536 words or 128K bytes between I/O and memory. 

3. The memory address fora DMA memory transfer must be on an even-byte address 
boundary. 

4. Channels 5, 6, and 7 transfer data in blocks that have a maximum size of 128K 
bytes throughout the 16M system memory. 

5. Since channels 5, 6, and 7 cannot transfer data on an odd-byte boundary, AO and 
BHE are both forced to 0. 

6. DMA #2 can be accessed (programmed) by another master from the expansion slot 
using the MASTER input signal on the 36-pin part of the ISA bus. 


DMA channel priority 


The BIOS of the x86 PC programs both DMAs to have channel 0 as the highest 
priority. This means that of the seven DMA channels available through the expansion slot 
of the x86 ISA bus PC, channel 0 has the highest priority and channel 7 the lowest prior- 
ity. This is due to the fact that the master DMA (8237 #2) has channel 0 as the highest pri- 
ority and since the slave 8237 #1 is connected to it, channels 0 through 3 have higher pri- 
ority than channels 5, 6, and 7. Therefore, we have the following: 


x86 ISA DMA Channel Priority 

channel 0 Highest priority 

channel | 

channel 2 

channel 3 

channel 5 

channel 6 

channel 7 Lowest priority 
gu gp a a Io 
CHAPTER 15: DIRECT MEMORY ACCESS AND DMA CHANNELS IN x86 419 


Figure 15-14 shows an example of DMA and I/O write cycle timing. 


DMA Idle DMAT1 DMAT2 DMAT3 DMAT4 DMA Idle 
| | | | 


| 
DMA clock SSS ae a 
| 


set-up hold 


Figure 15-14. DMA Memory Read and I/O Write Bus Cycle for Many x86 and Later PCs 


Review Questions 


1k 


True or false. In the ISA bus, all channels of the DMA #1 are available on the 
expansion slot. 


2. How many channels of the DMA are available on the ISA bus? 

3. Indicate on what part of the ISA (62- or 36-pin) bus the channels of the DMA are 
accessible. 

4. True or false. Channels 5, 6, and 7 can be used for 8- or 16-bit data transfers. 

5. Why is bus control in the IBM PC, PS, and compatibles alternated between DMA 
and the CPU? 

PROBLEMS 


SECTION 15.1: CONCEPT OF DMA 


jl; 


2 


3: 


Compare the rate of data transfer between the 8088 CPU and DMA. For DMA, assume 
that it takes 4 clocks to transfer a byte. How many times faster is DMA? 

Calculate the time needed to transfer 512 bytes by the 8088 and by DMA in Problem 
1. Assume 200 ns for each clock period. 

Explain the difference between the CPU's response to signals INTR and HOLDR. 


For the CPU, HOLDR is an (input, output) signal and HOLDA is an 

(input, output) signal. 

In response to activation of HOLDR, the CPU finishes the current before 
handing the buses to DMA. 

(a) instruction (b) bus cycle (c) subroutine 

The DMA cannot take over the buses until signal is activated by the CPU. 


Why it is much less expensive to design a DMA chip than a CPU chip? 
At what point does the CPU regain control over the buses? 


aa 
420 


SECTION 15.2: 8237 DMA CHIP PROGRAMMING 


10. 


Jil 


12 


14. 
l5- 


16. 


W 


18 


2i 
=. 


There are total of port addresses assigned to an 8237. 

Which of the following port addresses cannot be assigned to the 8237 DMA, and why? 
(This question is not PC compatible.) 

(a) 88H (b) 80H (c) 92H (d) FOH 

For a DMA channel to transfer data it must have two sets of information. State these. 
How many port addresses are assigned to each set? 


. Explain why a total of eight port addresses are set aside for the 8237 channels. 
13. 


If CS is activated by A7-A4 = 0101, give the port addresses assigned to the four chan- 
nels of the 8237. 

In Problem 13, what are the port addresses of the 8237 internal control registers? 
Which register inside the 8237 is used to program the activation level (low or high) of 
the DREQ and DACK pins? In Problem 14, what port address is that? 

In fixed priority, which channel has the highest priority? Which has the lowest? How 
is this different from rotating priority? 

Assume that the 8237 is programmed for fixed priority. If DREQ2 and DREQ4 are 
activated at the same time, who gets serviced first? 


. The _ (x86 CPU, 8237 DMA) resolves channel priority. 
19, 
20. 


State the function of the status register TC bits in the 8237. Can we write into it? 
Program the mode register in Problem 14 for I/O-to-memory transfer, autoinitializa- 
tion, address decrement, and block mode for channel 2. 

Program the single mask register in Problem 14 to enable channel 3. 

Show the programming of the memory address and count registers of channel 3 to 
transfer 8K bytes of data from I/O to memory starting at address 1500H. 


SECTION 15.3: 8237 DMA INTERFACING IN THE IBM PC 


23; 


24. 


To access a memory block, explain how many address bits are provided by the 8237, 
and how they are provided. 

The 8237 can access 64K-byte blocks of memory. Explain how the 8237 in the IBM 
PC can access the entire 1M address range. 


. True or false. For every DREQ there is a DACK in the 8237. 
. True or false. For every DREQ there is a HOLD in the 8237. 
. State if each of the following pins is input, output, or both for the 8237. 


(a) HOLD (b) HOLDA (c)DREQs (d)DACKs  (e)A3-A0 
(f) ADDSTB (g) IOR (h) IOW ()MEMR (j) MEMW 


SECTION 15.4: DMA IN x86 PCs 


28. 


29. 


30. 


Sil. 


True or false. In the ISA bus, the use of channel 0 for DRAM refreshing was aban- 
doned. 

In the ISA bus, how many channels are available through the expansion slot? Indicate 
on what part (62-pin or 36-pin) they are available. 

Why do you think there are two separate performance benchmarks for memory-inten- 
sive and disk-intensive applications? 

Although 386 and higher PCs have 32-bit address buses, the DMA uses only 24-bit 
addresses in ISA type PCs. Explain why and state the implication. 


E 


CHAPTER 15: DIRECT MEMORY ACCESS AND DMA CHANNELS IN x86 421 


ANSWERS TO REVIEW QUESTIONS 


SECTION 15.1: CONCEPT OF DMA 


True 
True 
True 
True 
HOLD 
HLDA 
Input 
Output 


eo See 


SECTION 15.2: 8237 DMA CHIP PROGRAMMING 


AO, Al, A2, A3, and CS 

Because A0-A3 gives rise to only 16 possibilities, 0-F hex 

True 

The memory address of the first byte of the block of data to be transferred is loaded in 
the memory address register and the number of bytes to be transferred is loaded into 
the count byte register. 

5. MOV AX,1440H 7LOAD LOWER 4 DIGITS OF START ADDRESS 

OUT 90H,AL ; SEND OUT THE LOW BYTE OF THE ADDRESS 

MOV AL, AH 

OUT 90H,AL ,oEND OUT THE HIGH BYTE OF THE ADDRESS 

MOV AX, 4048 ;LOAD BLOCK SIZE INTO AX N 

OUT 91H,AL ;SEND OUT THE LOW BYTE OF THE COUNT 

MOV AL, AR 

OULSO Hrats ,SEND OUT THE HIGH BYTE OF THE COUNT 

Channel 0 

True 


a 


SECTION 15.3: 8237 DMA INTERFACING IN THE IBM PC 


1. OO-OF hex 
. OB hex 
3. It is provided through the DO—D7 data pins of the 8237 to the 74LS373 latch only 
when A0-A7 rolls over from FF to 00. 
4. 0,3 


SECTION 15.4: DMA IN x86 PCs 


True 

Only 7 

Channels 1-3 on the 62-pin section and channels 0, 5, 6, and 7 on the 36-pin section 
False; they can be used only for 16-bit data transfers. 

This is because in the mode register initialization, the DMA is programmed for the sin- 
gle mode. This means that for every DMA cycle there is a CPU cycle in between, mak- 
ing the DMA bus bandwidth much lower than if the DMA had control over the buses 
for the entire duration of the data transfer. The real reason is that the buses must be 
released by the DMA in order to allow refreshing of DRAM before it loses the data. 


a ee 


422 


CHAPTER 16 


VIDEO AND VIDEO ADAPTERS 


OBJECTIVES 


Upon completion of this chapter, you will be able to: 


>> 


>> 


>> 


>> 
>> 


>> 


>> 


>> 


Determine the quality of a monitor by technical features such as 
resolution, dot rate, horizontal and vertical frequency, and dot pitch 
Describe how images are produced on the screen by the method called 
raster scanning 

Explain the function of the video adapter board and its two 
components: video display RAM and the video controller 

Describe the differences in text and graphics modes 

State the purpose of the attribute byte and how it affects storage space 
in video display RAM 

Write Assembly language programs to manipulate text data on the 
screen using INT 10H 

Describe the relation between the number of colors available for a 
monitor and the amount of video memory needed 

Write an Assembly language program to program pixels on the screen 


423 


Although the quality of video monitors has improved dramatically since the intro- 
duction of the first IBM PC in 1981, the principles behind them have remained the same. 
This chapter will look at the video system of the x86 PCs. 


SECTION 16.1: PRINCIPLES OF MONITORS AND VIDEO 
MODES 


Video monitors use a method called raster scanning to display images on the 
monitor screen. This method uses a beam of electrons to illumine phosphorus dots, called 
pixels, on the screen. This electron gun rasters from the top left corner of the screen to the 
bottom right, one line at a time. As the gun turns on and off, it moves from left to right 
toward the end of the line, at which time it is turned off to move back to the beginning of 
the next line. This moving back while the gun is off is called horizontal retrace. When it 
reaches the bottom right of the screen, the gun is turned off and moves to the top left of 
the screen. This turning off and moving back to the top is called vertical retrace. Figure 
16-1 shows two methods of scanning. One is noninterlaced (normal) scanning and the 
other is interlaced scanning. The concept for both is the same, but in interlaced scanning 
(which is the same method used in television sets), each frame is scanned twice. First the 
odd lines are scanned and then the gun comes back to scan the even lines. This method 
can create flicker, but allows better vertical resolution at a cheaper cost. Noninterlaced 
monitors provide much better flicker-free images than do interlaced monitors and for this 
reason are widely used as monitors of most PCs. 


even line 


horizontal 
retrace N odd line 


vertical 
retrace 


Noninterlaced scan Interlaced scan 


Figure 16-1. CRT Scanning Methods 


How to judge a monitor 


The resolution of the screen depends upon the following factors: 


1. The number of pixels (dots) per scanned line 
2. The speed at which the gun can turn on and off the phosphors on the surface of the 
tube 
The speed at which it can scan and retrace a horizontal line 
The number of scan lines per screen (frame) 
5. The speed at which it finishes one frame and performs the vertical retrace. 
While in a television set, horizontal scanning is done at the rate of 15,750 times 
per second (15.75 kHz) and vertical scanning at 60 times per second (60 Hz), on the IBM 


ee 
424 


Ero 


PC monochrome monitor using the monochrome display adapter (MDA) the frequencies 
are 18.432 kHz and 50 Hz, respectively. Knowing these two frequencies enables one to 
calculate the maximum number of scan lines per screen by dividing the horizontal fre- 
quency by the vertical one as follows: 


number of scanned lines per screen = horizontal freq. (HF) / vertical freq. (VF) 
(not all visible) 


Example 16-1 


In a IBM PC monochrome monitor with HF = 18.432 kHz and VF = 50 Hz, calculate the number 
of scanned lines per screen. 


Solution: 


The number of scanned lines = HF/VF; therefore, 18,432 divided by 50 = 368 lines per screen. 


Not all 368 horizontal lines in Example 16-1 are visible on the screen since some 
lines are for overscan and some are used for vertical retrace time. Overscan refers to the 
lines above or below the visible portion of the screen; these lines ensure clear edges at the 
top and bottom of the screen. In the IBM PC MDA, only 350 lines are visible on screen 
and of the remaining 18 (368-350), some (about 3 or 4) are used for overscanning. The 
time that would have been taken for scanning the rest (approximately 14) is used for the 
vertical retrace. Now that the number of scan lines is known, the next question is, how 
many pixels (dots) are there per line? This is calculated by dividing the video frequency 
(sometimes called dot frequency) by the horizontal frequency: 


number of pixels per scan line = dot frequency / horizontal frequency 
(not all visible) 


In the IBM monochrome display adapter (MDA), dot frequency is 16.257 MHz, 
which, when divided by HF =18.43 kHz, gives 882 pixels per line. Again, all 882 pixels 
are not visible. With the IBM PC monochrome adapter, only 720 pixels are visible for 
each scan line. The time set aside for the remaining 162 is used for the time taken by hor- 
izontal retrace and overscanning on the left and right sides of the screen. Again, this over- 
scanning allows sharp edges on the right and left sides of the screen. From the above dis- 
cussion it can be seen that the three most critical factors in a monitor are: 


1. The video frequency (also referred to as dot rate, pixel rate, or video bandwidth) 
2. The horizontal frequency 
3. The vertical frequency 


From these three parameters, the number of pixels per line and the number of 
lines per screen can be calculated, keeping in mind that not all pixels and lines are visible 
on the screen, due to overscanning and retrace times. The number of visible pixels is given 
by the manufacturer of the adapters or monitors. Looking at Table 16-1 for the IBM mono- 
chrome adapter, the number of pixels is 720 x 350, which means that there are 720 pixels 
per line and 350 lines per screen, giving a total of 252,000 pixels. The total number of pix- 
els (dots) per screen is a major factor in assessing a monitor's resolution, which is one of 
its most critical characteristics. The total number of pixels per screen is determined by the 
size of the pixel and how far apart pixels are spaced. For this reason one must look at what 
is called the dot pitch in monitor specifications. 


eee ee een SSS" 
CHAPTER 16: VIDEO AND VIDEO ADAPTERS 425 


Table 16-1. Adapter Characteristics 


=a 
ole ee T 


eS. 

|| des |as leo foigtaree 

1987 
| Jams |as |o [anao —= 
peas las ro Analog = 

a et lice w | al 


Dot pitch is the distance between adjacent pixels (dots) and is given in millime- 
ters. For example, a dot pitch of 0.31 means that the distance between pixels is 0.31 mm. 
Consequently, the smaller the size of the pixel itself and the smaller the space between 
them, the higher the total number of pixels and the better the resolution. Dot pitch varies 
from 0.6 inch in some low-resolution monitors to 0.2 inch in higher-resolution monitors. 
In some video monitor specifications, it is given in terms of the number of dots per square 
inch, which is the same way it is given for laser printers, for example, 300 DPI (dots per 
inch). 


Dot pitch and monitor size 


Dot pitch 


Monitors, like televisions, are advertised according to their diagonal size. For 
example, a 14-inch monitor means that its diagonal measurement is 14 inches. There is a 
relation between the number of horizontal and vertical pixels, the dot pitch, and the diag- 
onal size of the image on the screen. The diagonal size of the image must always be less 
than the monitor's diagonal size. The following simple equation can be used to relate 
approximately these three factors to the diagonal measurement. It is derived from the 
Pythagorean theorem: 


(image diagonal size)? = (number of horizontal pixels x dot pitch)? 
+ (number of vertical pixels x dot pitch)? 


Since the dot pitch is in millimeters, the size given by the equation above would 
be in mm, so it must be multiplied by 0.039 to get the size of the monitor in inches. See 
Example 16-2. 


As can be seen from the above discussion, one can use a lower vertical frequen- 
cy to get a higher number of vertical pixels. However, this can result in flickering, as hap- 


Example 16-2 


A manufacturer has advertised a 14-inch monitor of 1024 x 768 resolution with a dot pitch of 0.28. 
Calculate the diagonal size of the image on the screen. It must be less than 14 inches. 


Solution: 


The calculation is as follows: 


(diagonal size)? = (1024 x 0.28 mm)? + (768 x 0.28 mm)? 
diagonal size (inches) = 358 mm x 0.039 inch per mm = 13.99 inches 


426 


pens in 1024 x 768 interlaced monitors. The interlaced method is a cheap way of increas- 
ing the vertical pixels by halving the vertical frequency and making the frame be scanned 
in two successive sweeps of odd and even fields. 


Phosphors 


Another factor that determines the quality of the monitor is the type of phosphors 
used, since the brightness of the pixels depends on two factors: 


1. The intensity of the electron beam, which decides how bright or dark each pixel 
should be. This can be controlled by software, as will be seen in Section 16.4. 

2. The phosphor material used. After the pixel has been illumined, some phosphor 
materials retain their brightness for longer periods of time than others. This charac- 
teristic, called persistence, is fixed in the monitor and cannot be changed. In the 
early IBM PC monochrome monitors, a high-persistence type of phosphor was 
used to create a better look on the monitor in case there was a need for frequency 
compensation. In situations where the frequency cannot be adjusted, such as the 
IBM monochrome monitors, a phosphor with longer persistence is used to compen- 
sate. 


Color monitors 


In color monitors the principles are the same, except that every phosphor dot is 
made of three colors: red, green, and blue, hence the name RGB (red/green/blue) moni- 
tors. Color monitors require three different wires to carry three electronic beams, one for 
each color, unless the monitor is of the composite brand. In composite monitors there is 
one single wire that carries all three colors. The process of combining, then separating, 
the three colors diminishes the quality of the image compared to true RGB monitors. In 
older RGB monitors, for every color wire there was a separate gun, but in newer models 
a single gun carries three wires for the three beams. Another difference between color and 
monochrome monitors is the presence of the shadow mask on color monitors. The shad- 
ow mask is a metal plate with many holes that is placed just before the phosphor-coated 
screen in order to coordinate the shooting of the electron beam of each gun through a sin- 
gle hole. This ensures that the red gun illumines the red dot only, the blue gun illumines 
the blue dot only, and the green gun illumines the green dot only. 

Since each pixel has three dots of color, red, green, and blue, how is the dot pitch 
measured? The answer is that the dot pitch on color monitors is the distance between two 
dots of the same color, or as some manufacturers advertise, the distance between two con- 
secutive holes of the shadow mask. While in monochrome monitors, it is changing the 
intensity of the electron gun that generates the shades of gray-black-white, in color mon- 
itors it is the combination of the three primary colors that generates all other colors. In 
other words, by changing the intensity of the red, green, and blue triad, one can create all 
the colors (see the next section for some examples). 


Analog and digital monitors 


Another monitor characteristic to be explained is digital versus analog. Digital 
monitors, such as the MDA- and CGA-based monitors, use a number of bits to specify 
variations of color and intensity. To increase these variations one must employ large num- 
bers of bits. Analog monitors, because they can accommodate many more variations, have 
much better quality pictures. To understand the difference between digital and analog sys- 
tems, imagine that we have defined a temperature of 20 degrees as cold and 100 degrees 
as hot. These would be represented in digital as 0 and 1 for cold and hot, but an analog 
system could accommodate temperature variations of 20 to 100 in increments of one, 
allowing many more variations. Another example is the state of a light bulb. In digital it 
is represented as 0 and 1 for on and off. In analog, one can accommodate many more vari- 
ations, similar to the concept of using a dimmer switch. 


SL KM 


CHAPTER 16: VIDEO AND VIDEO ADAPTERS 427 


Video display RAM and video controller 


Communication between the system board (motherboard) and the video display 
monitor is through the video adapter board. Among the components every video board 
must have is a video controller and video display RAM. The information displayed on the 
monitor (either text or graphics) is stored in memory called video display RAM (VDR), 
also called video buffer. In order for the information to be displayed, it must be written 
first into video RAM by the CPU; then it is the job of the video adapter's controller 
(processor) to read the information from video RAM and convert it to the appropriate sig- 
nals to be displayed on the screen. In other words, there is a separate controller, often 
called a CRT controller or video processor, apart from the main x86 CPU. The video con- 
troller's sole job is to take care of the video section of the computer. Since the CRT con- 
troller is built specifically for that purpose, it can perform the tasks associated with video 
much more efficiently than can a CPU such as the x86. That also means that video RAM 
must be accessible to both the main processor and the video processor. In the IBM PC 
and compatibles, 128K bytes of the 1 megabyte of addressable memory, from address 
A0000H to BFFFFH, is set aside for video display RAM. Of this 128K bytes of memo- 
ry, only some is used; the amount depends on the resolution of the video adapter and the 
selected mode: text mode or graphics. For example, when displaying text the IBM mono- 
chrome adapter uses only 4K bytes of memory, starting at memory address BOOOOH. Of 
this 4K bytes, 2K bytes are for the full screen of characters (80 characters per line and 25 
lines per screen = 2000 bytes) and another 2000 bytes are for the attributes. The attribute 
byte provides various information, such as color, intensity, and blinking, to the video cir- 
cuitry. As will be seen later, every time a byte of character data is accessed, its attribute is 
automatically fetched as well. The memory requirement of each video board and the num- 
ber of colors that it can handle will be given in the next section. 

If the same video display RAM is accessed by both the microprocessor and the 
video controller, how can two masters access the same RAM at the same time? There are 
several solutions to this dilemma. 


1. The CPU can access the video RAM only during the time when the video con- 
troller is doing the retrace. 

2. To use a more expensive, specially designed kind of RAM called VRAM (video 
RAM). This kind of RAM allows the transfer of data by the video controller at a 
much higher rate than is allowed by normal DRAM. 

3. Another approach used in some high-performance graphics system is to use dual- 
port RAMs. This kind of RAM has two sets of data pins, allowing both the CPU 
and video controller to access the video RAM with much less conflict, since it 
eliminates the time wasted by a multiplexer. 


It must be noted that if the CPU tries to access the VDR while the video controller 
is accessing it, the CPU is blocked since the screen must be refreshed by the video con- 
troller before it is lost. In other words, the video controller has a higher priority than the 
main CPU in accessing the VDR. If by software manipulation one blocks the video con- 
troller's access to the VDR, it will result in snow on the screen. 

Video systems have improved dramatically in recent years, due to the fact that the 
speed of the CPU has reached 5 GHz and can therefore transfer data from (or to) the VDR 
at a much faster rate during the retrace time. 


Character box 


Video boards can be programmed in two modes: text and graphics. While in 
graphics mode the individual pixels are accessed and manipulated, in text mode charac- 
ters, which are a group of pixels, are accessed. In text mode, horizontal and vertical pix- 
els are grouped into what are called character boxes. Each character box can display a sin- 
gle character. The size of the character box matrix varies from adapter to adapter. For 
example, IBM's MDA (monochrome display adapter) has a 9 pixel by 14 pixel character 


—$—$—$——— Eee 
428 


Example 16-3 


If the MDA character box is 9 x 14 (9 pixels wide and 14 pixels high) and the resolution of MDA 
is 720 x 350, verify the fact that MDA in text mode can display 80 x 25 characters per screen. 


Solution: 


720 horizontal scan lines divided by 9, the width of character box, gives 80 columns of characters. 
Dividing 350 vertical pixels by 14, the height of the character box, results in 25 rows of characters. 


Example 16-4 


In a given adapter, the character box is 8 x 14 and the adapter in text mode displays 80 x 25 
characters. Calculate the pixel resolution. 


Solution: 


The total number of horizontal pixels is 640 (8 x 80) and the vertical number is 350 (14 x 25). 
Therefore, it has 640 x 350 resolution. 


Character 1 Character 2 Character 3 Character 4 Character 5 Character 6 Character 7 


OOSS8B8OO0OB8oo O 
OBUOOOSODOSSO E] 
Om0ġġ000M00% 0E O 
OBOOOUSOO8ODO O 
OBOOOOUS8ODOS80O0 o 
O@OOO0OS8O0O8DO Oo 
wimi f i i mimm mim B| 


Figure 16-2. Character Boxes 


box. Since every character is 9 pixels wide and 14 pixels high, one can calculate the num- 
ber of character columns per screen by dividing the number of pixels on a horizontal line 
by 9, and can calculate the number of rows per screen by dividing the number of pixels 
on a vertical line by 14. Conversely, one can calculate the horizontal and vertical pixels 
by using the size of the character box, the number of rows, and the number of columns per 
screen: 


pixels per scan line = number of character columns x pixel width of char. box 
raster lines = number of rows per screen x pixel height of char. box 


See Examples 16-3 and 16-4. To get better-looking characters, the character box 
size must be increased, which translates to more pixels horizontally and vertically. From 
the above discussion, one can conclude that for a fixed-size monitor to display a fixed 
number of rows and columns of characters, the number of horizontal and vertical pixels 
is the most important factor. Since the number of horizontal and vertical pixels is directly 
proportional to the horizontal and vertical dot frequencies, in judging CRT monitors one 
must look for higher HF (horizontal frequency), VF (vertical frequency), and DF (dot fre- 
quency). Figure 16-2 illustrates the character box. 


x86 PC video modes 


When the original IBM PC was introduced in 1981, it had two video monitor 
options: MDA (monochrome display adapter) and CGA (color graphics adapter). While 
CGA allowed both graphics and text mode options, MDA allowed only text mode. 
‘Consequently, CGA had color for both graphics and text, but the text was not very crisp. 
On the other hand, MDA had excellent text, but did not support graphics. Not until 1985 
was EGA (enhanced graphics adapter) introduced in order to provide both graphics and 
text on the same monitor. In 1987, IBM introduced the new video standards called VGA 


Heenan aa! 


CHAPTER 16: VIDEO AND VIDEO ADAPTERS 429 


(video graphics adapter) and MCGA (multicolor graphics array). In MDA and CGA, IBM 
used the Motorola 6845 CRT controller to design the adapter board, and in EGA used a 
set of proprietary LSI chips, but in all three cases the adapter board had to be plugged into 
one of the expansion slots. Later on, IBM put the video circuitry on the motherboard. 
Currently, some manufacturers make high-performance VGA-compatible adapter plug-in 
boards for graphic-intensive video games. 


VGA (video graphics array) 


VGA is a single-chip video controller designed by IBM that performs many tasks 
previously done by several chips in EGA. However, many use the term VGA to refer to 
the entire adapter. It has excellent resolution of up to 720 x 400 for text modes and 640 
x 480 for graphics modes. In today’s PC, VGA is already on the motherboard, but one can 
also purchase an adapter board for high-performance video games to be plugged into one 
of the expansion slots of the x86 PC. Table 16-1 shows the history of early video modes. 


er ee fos eee 
e pR o =e (ks 8 | 
am 


B = blinking, | = intensity 
Both blinking and intensity are applied to foreground only . 


Figure 16-3. CGA Text Mode Attribute Byte 


Table 16-2 shows the various video l 
modes. Notice that VGA emulates all the Table 16-2: The 16 Possible Colors 


modes of CGA, MDA, and EGA, plus some 
new modes that are not available with the LI |/R/GI| BI Color 
earlier adapters. Figure 16-3 shows the var- Oa, On Olea Black 
ious color modes available in text mode. 
Figure 16-4 shows the text mode attributes oy oto Blue 

1 Green 
Po [1 | 1 | cyan 
0 | Red 


in black and white. See Examples 16-5 and 
1| o| 1 |magena | 
Er 


16-6. 
Video memory and attributes in VGA 
Up to 1 megabyte of DRAM can be 
1|0|Brown | 
jo lolca o 
ama 
E p 


ics mode, it requires 1 megabyte of DRAM 
to store them. How the 1M of memory is 
mapped into the 128K-byte address space 
A00000—BFFFFH is discussed in Section 
16.3, which covers text mode programming. 

When VGA is programmed to emu- 
late CGA text, the address for the video is 
B8000H, but for MDA emulation the 
address is BOOOOH. This is in order to be 
compatible with previous adapters.When 
VGA is in text mode it uses mode 3, as we 
will see in the next section. 


installed on VGA boards. This extra memo- 
ry is used to store pixels and their attributes. 
Since VGA can display up to 256 colors out 

rial Light blue 

od 1 Light green 

o | 4 | 4 | Lightcyan 


of 262,144 possible colors at once in graph- 
10 | 0 |tigtrea | 
afao 1 |tightmagenta 
t{1[1]o}yetow a 
High-intensity white 


ead 
o 
Ea 
iioi 
ma 
1 
i 


430 


Table 16-3: Video Modes and Their Definitions 


aL | Pies | chars | Box | tie _| 
OH 


320 x 200 
20 x 350 
5 


1_| 88000 | 
ı |Bsooon | 
1_| 8000 | 


& 


Colors | 
2 16 G 
16 E 

60 x 400 ba _| 16 
20 x 400 1 |M 
20 x 200 16 [c 

20 x 350 1 

j 

1 

1 

1 

1 

, 

1 

: 
| 16 | 


5) 


w 


1H 


60 x 400 


& 


20 x 400 


Q 


MCGA 


40200 | 20 x25| 8x8 [Tox 


2H C 


oO 


40 x 350 Text 


(o>) 
on 


20 x 400 


(o>) 


MCGA 
C 


ii 
e 
P 
ie 
PE 
rm 
E 
16 _| 
ae 
"Tam 
40x400 | g0x25| ex16\Tet | 16 | 
aox200 | e0x25| 8x8|Tea | 16 | 
40x350 | eox25| sx 14ltet | 16 | 
Te | e | 

16 _| 

Za 

red 

Ba 

[Mono 

[Mono _| 

[Mono _| 

e 

Ea 

ma 

[Moro | 


© 
& 
(ep) 


© |© O 
T 


V 
E 
V 
E 
V 
H 
E 
20 x 400 | 80x25 | 9 x 16|Text V 
Pe len wos 
20 x 200 40 x 25| 8x8 |Graphics 
25 
25 


N 


MCGA 
CGA,EGA,VGA,MCGA 
CGA,EGA,VGA,MCG 
CGA,EGA,VGA,MCG 


M 
E 4 [Boooon | 
vea | e|Boooon 


8 [aoooon | 
40 x 200 |_80 x 25| 8 x 8 | Graphics | 
40 x 350 
40x350 | 80x25] 8x 14|Graphics | mono | vea | 2[aooon _| 


4 
5 
6 
7 


T 


20 x 200 8 x 8 | Graphics 
40 x 200 8x 8 
20 x 350 ae lath | 
20 x 360 


320 x 200 
40 x 200 TrA X25 ee epis 


OQ 
i 


N 


N 


Ə je 
E E 


720 x 400 
08H — OCH not used 


© S =) 

TI O 

ag E 

© o o œ o w 
N 

© 

N 

© 

© 


GA 
GA 
GA 
CG 
GA 
GA 
GA 
GA 
GA 
GA 
GA 
GA 
GA 
DA 
GA 
EGA 
VGA 


EH 


10H] 640x350 | sox25| 8x 14|Graphics | 4 | 
640x350 | sox25| 8x 14|Graphics | 16 | 
40 x 480 
640 x 480 
13H] 320 x 200 8x8 


* Color burst off. 


6 
6 
6 
6 
6* 
6* 
6* 
6* 
6 
6 
6 
6 
4 
4* 
2 
16 
16 
16 
16 
4 
16 


a 
N 
Be 


—_ 
— 
T 


Graphics 


CHAPTER 16: VIDEO AND VIDEO ADAPTERS 431 


D1 
= eee E a 


5 | oa | os [o o o] 


Foreground intensity 


0 = normal intensity 
1 = highlighted intensity 


Background intensity 


0 = nonblinking 
1 = blinking 


Figure 16-4. Attribute Byte in MDA 
Example 16-5 


Using Figure 16-3, find the attribute byte (in binary and hex) for the following color options: 


(a) blue on black (b) green on blue (c) high-intensity white on blue 
(d) red on blue 


Solution: 


Binary Color Effect 
(a) 0000 0001 Blue on black 
(b) 0001 0010 Green on blue 
(c) 0001 1111 High-intensity white on blue 
(d) 0001 0100 Red on blue 


Find the attributes associated with the following attribute bytes in MDA. 

(a) 07H (b) OFH (c) 70H 

Solution: 

(a) 07H = 00000111 gives background black, foreground normal intensity, nonblinking. 
(b) OFH = 00001111 gives the same as (a) except with foreground highlighted. 


(c) 70H = 01110000 gives black on white, a reverse video screen mode in which the foreground is 
black and the background is white, nonblinking. 


Review Questions 


1. The way images are displayed on the monitor screen is referred to as ; 
Increasing the dot frequency (DF) but keeping HF and VF constant will increase 
the number of (horizontal, vertical) scan lines. 

3. True or false. The smaller the dot pitch, the better the monitor. 

4. Of the three frequencies DF, HF, and VF, state which has the: 


ÍC 


432 


(a) highest frequency (b) lowest frequency 

5. True or false. For a fixed-size monitor design, one must increase the dot pitch to 
get more pixels. 

6. True or false. For any information to be displayed, it must be stored in the VDR. 

7. True or false. The VDR memory address range (memory space) must be accessible 
to both the main CPU and the video processor. 


8. True or false. To display crisper characters, one must design more pixels into the 
character box. 


SECTION 16.2: TEXT MODE PROGRAMMING AND VIDEO 
RAM 


As we showed in Chapter 4, we can use BIOS INT 10H to program video modes 
of the x86 using Assembly language instructions. In this section we revisit text and graph- 
ic mode programming. At the end of this section, character generator ROM is discussed. 


Finding the current video mode 
To find the current video mode, set AH = OF and use INT 10H as follows: 
MOV  AH,OFH ;AH=0F 
INT 10H 


Reminder: DEBUG assumes that numbers are in hex. If you are assembling the 
above program in DEBUG, make sure to remove the H, and also put INT 3 as the last 
instruction. 


Changing the video mode 


To change the video mode, use INT 10H with AH = 00 and AL = video mode. A 
list of video modes is given in Table 16-3. Regardless of what mode is selected, MDA, 
CGA, EGA, and MCGA all are supported by the VGA monitor. Examples 16-7 and 16-8 
shows the use of INT 10H functions. 


Setting the cursor position (AH = 02) 


The cursor position is set by making AH = 2, DH = row, and DL = column. The 
following sets the cursor to row 12, column 28. 


MOV AH,02 Set the cursor 


MOV BH, 0 ;page 0 
MOV DATZ TOW l2 
MOV DI 28 col Ae 
INT 10H ;invoke interrupt 


Getting the current cursor position (AH = 03) 
To get the current cursor position, AH = 03 of INT 10H must be used: 


MOV AH,03 ;get cursor position 
MOV BH,O ;page 0 
INT 10H 


After running the above code, registers DH and DL have the row and column 


S 


CHAPTER 16: VIDEO AND VIDEO ADAPTERS 433 


Example 16-7 


Write the following program using INT 10H to: 
(a) change the video mode to 03 
(b) display letter "A" in 80 locations with the attributes of red on blue 
(c) use the DEBUG utility to run and verify the program 
(d) use DEBUG to dump the video RAM contents 


Solution: 
(a) MOV AH,00 ;set mode option 
MOV AL, 03 ;CGA text mode of 80x25 


INT 10H ;monitor is monochrome 


(b) Using INT 10H function AH = 09, one can display a character a certain number of times with 
specific attributes. 


MOV AH, 09 ;display option 
MOV BH, 00 ;page 0 
MOV AL, 41H PASC ftor letter "AT 
MOV CX, 20H ;repeat it 80 (50h) times 
MOV Bia, TAH ;red on blue 
INT 10H 
(c) Reminder: DEBUG assumes that all the numbers are in HEX. 


C:\>debug 
ZA 

als 2 OONO, 
LSI SOIL OZ 
ers On oO4 
iLL Sal BO ALONE: 
IAL Sal, S10) LVS: 
eS aes OM OA 
MSL OAL OE 
AL Sl SAL Oa: 
LASS ONT 
MEEO Ales: 
ALS! SL ONTA 


Now see the result by typing in command -G. Make sure that IP = 100 before you run it. 


(d) When EGA and VGA monitors emulate CGA in text mode, the video memory address starts at 
B8000H. Dumping memory immediately after running the above program gives the following. 
Notice the character and the attribute byte stored in even and odd addresses. 


-D B800:0 4F 
B800:0000 41 
B800:0010 41 
B800:0020 41 
B800:0030 41 
B800:0040 41 


434 


Example 16-8 


Modify Example 16-7 using mode 7 to emulate MDA. Use attribute 87H (white on black blinking). 


Solution: 


The following shows the code as it would be entered in DEBUG, 
eso 100 MOV AH, 00 
esl 20102 MOV AL, 07 
sae 0104 INT 10 
me! O16 MOV AH, 09 
Peo OER: MOV BH, 00 
mis. 010A MOV AL, 41 
IMS OTOC MOV Cy ae 
131: 010F MOV BL, 87 
S OMT INT 10 
SIR OITS INT 3 
1S O 


After running the above program, we dump the MDA video buffer starting at memory address 
B000:0 and see the data and attribute bytes. 


=D B000:0 F 
Beo0-0000 41 87 41 87 41 87 41 87-41 87 41 87 41 87 41 87 A-A.A.LA AAAA: 


positions of the cursor (in hex), respectively. 


Scrolling the window up to clear the screen (AH = 06) 


The options 06 and 07 are called scroll functions. They are used to scroll a part 
or all of the screen up or down. One of the most widely used applications of option AH = 
06 is to clear the screen, as shown next. 


MOV AH, 06 AGG lLIL Tbs) oyonesleyal 

MOV AL, 0 ;the entire screen 

MOV BH, 07 ;normal attribute 

MOV Cy, @ neo OMGeop Iert coT) 

MOV CHPO ;row 0 (top left row) 

MOV DEOS PEO WAS Sreo eee Greil) 
MOV DH, 24 ;row 24 (bottom right row) 
INT 10H 


A more efficient version of the above code is 


MOV AX,0600H 
MOV BH, 07 


MOV C 
MOV DX, 184FH ;18H=24 AND 4FH=79 
INT 10H 


Character generator ROM 


To display characters on the screen, every video board must have access to the 
pixel patterns of the characters. In CGA, the patterns are burned into BIOS ROM starting 
at F000:FA60H. To decipher the patterns, first remember that the CGA character box is 
8 x 8. Therefore, for every ASCII character there must be 64 (8 x 8 = 64) bits for each 


———— 


CHAPTER 16: VIDEO AND VIDEO ADAPTERS 435 


pattern. This means that every 8 bytes of the ROM provides the pattern for one character. 
Now let's see the patterns by using DEBUG. 


A> DEBUG 

-D FOOO:FA6E L4F 

FOOO:FA60 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF 00 00 
FOOO:FA70 00 00 00 00 00 00 7E Slee eE 99 ole Paes 
FOOO:FA80 DB FF C3 E7 FF 7H 6C FESnn EERO 38 10 COM TGeeo 
FOO0:FA90 FTC BE 7C 38 10 00 38 7@>36 FEUER 7C aa Te 10 3g 
FOOO:FAAO S8"7O "RE 7¢C 38 7C O00 00-18 30 3¢ 18 00 COR aes 


Starting with FA6E, the first 8 bytes are for blank (null), resulting in all 00s. The 
second 8 bytes are for the patterns for happy face, and so on. Inspecting the contents of 
ROM BIOS reveals the following character definition table. 


Address Patterns in Hex ASCII Hex Dec 
H000: EACE 00r OO), (ONO) (ONO) (ONO), (O10), 0000 NULL 00 00 i3 
FO0O0:FA76 Tip Sil, NS) hil, BE 99,81 7E HAPPY FACE 01 01 


F000:FA7E [iy EE TDB ER CS EI BEMI E 


F000 : FBEE IC CECE, DEAR EG e 200 0 30 48 


E000: FEBF6 307, 10,50, 0 0 30,7 Fe, O it alk 49 
F000: FC36 Tar Cr CC (GC, UGS. 0, 0 9 39 37 
E000: EC76 S07 Tin CE CE IIe CC CE 00 A 41 65 
FOOQ:FC7E HeroG,co, /C, oo, 60, FC, 00 B 42 66 
F000: FD3E FE 7COnee, 1e, 32,06, ru, 00 Z 5A 90 
F000: FD76 00, 00,78, 0C, 7e,eG, 7 o700 a 61 oe 
FOOO:FD7E HO; CU, 60, IOT SC CO DEROO b 62 98 
E000: FES6 OO, OGREC, CC, Ce e OCT Fs y qe leva 
F000: FE3E CO O0 PC, oe, 30,64, be, 00 2 7A E22 
F000: FE66 00,10, 38, 6C, C6, Co fmm, 00 DELTA TE 127 


For example, for the happy face character we have 7E, 81, A5, 81, BD, 99, 81, 
and 7E for the hex patterns of the bits that form the character of the screen. 
Example 16-9 demonstrates this. 


How characters are displayed in text mode 


To see how characters are displayed on the screen, analyze the MDA block dia- 
gram shown in Figure 16-6. For CGA, the process of generating the signals is exactly the 
same as in MDA except that the video circuitry generates R (red), B (blue), G (green), as 
well as vertical, horizontal, intensity, and dot pattern signals. 


Example 16-9 


Draw the patterns for the happy face (02H) and letter "A" (41H) on an 8 x 8 box. 


Solution: 


See Figure 16-5. 


436 


Hex Binary 


01111110 
10000001 
10100101 


10000001 
10111101 
10011001 
10000001 


Hex Binary 


30 


00110000 
01111000 
11001100 
11001100 
11111100 
11001100 
11001100 


01111110 00000000 


Figure 16-5. Diagram for Example 16-9 


The function of the multiplexer in Figure 16-6 is to allow access to the video 
RAM by both the CPU and the 6845 CRT controller. The character generator ROM has 
patterns for all the ASCII characters. We just examined the content of this ROM holding 
the CGA patterns. First, the 6845 CRT controller is initialized by the CPU. Then to dis- 
play characters, the x86 CPU writes the characters and their attributes into the video 
RAM. The job of the CRT controller is to fetch the characters and send them to the char- 
acter generator ROM to get the patterns of every character on each row for the scan lines. 
As shown in Figure 16-6, the RAO—RA4 (row address) output pins from the 6845 fetch 
the specific row of the character and send it to a parallel-in-serial-out register (often 
referred to as a serializer). The job of this register is to provide the patterns to the video 
circuitry, one at a time, to be output serially to the monitor with the attribute. 


multiplexer 


microprocessor AQ-A11 
address bus 


AO - A11 


MAO -MA11 even odd 
address address 


2K bytes RAM 2K bytes RAM 
characters attributes 


microprocessor data I hee bus 
í V selector | 
8-bit latch . 8-bit data 
decoder 
character attribute 
decoder 


IOR or generator 
IOW | i) 
video 


{OW 
process 
circuitry 


character clock 


vert. sync. 

pin 9 of connector 
horiz. sync. 

pin 8 of connector 


parallel in/ 


video dot patterns 


intensity 


Figure 16-6. Original IBM PC Monochrome Block Diagram 


CHAPTER 16: VIDEO AND VIDEO ADAPTERS 437 


Character definition table in VGA 


In the discussion of CGA, the character definition table was examined by inspect- 
ing the contents of character generator BIOS ROM. The character box in CGA is 8 x 8 
and as a result, the text is not very sharp. In VGA, the character box is 8 x 16 and the pat- 
terns for all the characters are stored in ROM memory. The address for that memory varies 
from computer to computer. To get the address of the character definition table, use INT 
10H with AH = 11H, AL = 30H, and BH = 06. On return from INT 10H, ES:BP has the 
address. The number of bytes used to form patterns for each character is given in CX, and 
DL has the row number minus one. This is shown in Example 16-11. Example 16-10 dia- 
grams two characters for VGA. 


Example 16-10 


Draw the patterns for VGA characters of happy face (02H) and letter "A" (41H) on a 9 x 16 box. 
Contrast this with CGA in Example 16-9. 


Solution: 


The patterns for these characters are as follows. See Example 16-13 for how to get the patterns. 
Figure 16-7 shows the diagram. 

00,00,7E,81,A5,81,81,BD,99,81,81,7E,00,00,00,00 happy face 

00,00, 10,38,6C,C6,C6,FE,C6,C6,C6,C6,00,00,00,00 A 


Figure 16-7. Diagram for Example 16-10 


Review Questions 


1. What colors are defined by the following attribute bytes in CGA text mode? 
(a) 0 (b) 14H 

The characters displayed are the (foreground, background). 
When VGA emulates CGA, it uses what starting address for the video buffer? 
When VGA emulates MDA, it uses what starting address for the video buffer? 
True or false. The patterns for CGA characters are provided in BIOS ROM. 
True or false. The patterns for VGA characters are provided in BIOS ROM. 


aaao 
438 


PA ad Mag a 


Example 16-11 


Use DEBUG to find the address of the character definition table of a given VGA board. 


Solution: 
C:\>DEBUG 


s0100 
S 1G) 110) 
E OLOA 
70106 
:0108 
21109 


AX=XXXX BX=XXXX CX=0010 DX=0018 SP=XXXX BP=61E7 SI=XXXX DI=XXXX 
DS=17D9 ES=E000 SS=XXXX 
LDS TOL Os} (eke: 


CX = 10H gives 16 bytes used for each character. DL = 18H = 24, which indicates that there are a 
total of 25 rows of characters per screen. Dumping the contents of memory location at ES:BP, which 
is E000:61E7, shows the following pattern: 


-D E000:61E7 
POLE? 00-00 00 00 00 00 00 00 00 
poor OO OO 00 00 00 00 00 00-00 7E 81 A5 81 81 BD 96 
jon 00ss el 8 7E 00 00 00 00 00-00 7E FE DB FF BE C3 E7 
poz10 FF FF FE 00 00 00 00 00-00 00 00 6C (FE FE FE FE 


and so on. 
Going through these memory locations reveals: 


BYTE PATTERNS 
00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00 null 
00,00,7E,81,A5,81,81,BD,99,81,81,7E,00,00,00,00 happy face 


00,00,38,6C,C6,C6,D6,D6,D6,C6,C6,6C, 38, 00,00,00 0 


00,00,18,38,78,18,18,18,18,18,18,7E,00,00,00,00 l 
00,00,7C,C6,06,0C, 18,30,60,C0,C6,FE,00,00,00,00 2 


00,00, 10,38,6C,C6,C6,FE,C6,C6,C6,C6,00,00,00,00 
00,00,FC,66,66,66,7C,66,66,66,66,FC,00,00,00,00 


and so on. 


OS O I 
CHAPTER 16: VIDEO AND VIDEO ADAPTERS 439 


SECTION 16.3: GRAPHICS AND GRAPHICS PROGRAMMING 


In all the video programming examples given so far, characters have been used as 
units to be addressed and a character was treated as a group of pixels. In this section, pro- 
gramming individual pixels will be discussed. In graphics mode, pixel accessing is also 
referred to as bit-mapped graphics. First, the relationship between pixel resolution, the 
number of colors supported, and the amount of video memory in a given video board is 
clarified. 


Graphics: pixel resolution, color, and video memory 


There are two facts associated with every pixel on the screen: 


1. The location of the pixel 
2. Its attributes: color and intensity 


These two facts must be stored in the video RAM. The higher the number of pix- 
els and colors options, the larger the amount of memory that is needed to store them. In 
other words, the memory requirement goes up as the resolution and the number of colors 
supported go up. The number of colors displayed at one time is always 2” where n is the 
number of bits set aside for the color. For example, when 4 bits are assigned for the color 
of the pixel, this allows 16 combinations of colors to be displayed at one time because 2 
= 16. See Example 16-12. The relation between the video memory, resolution, and color 
for each video adapter is discussed separately. 


Example 16-12 


In certain video graphics, a maximum of 256 colors can be displayed at one time. How many bits 
are set aside for the color of the pixels? 


Solution: 


To display 256 colors at once, we must have 8 bits set for color since 28 = 256. 


The case of CGA 


The CGA board can have a maximum of 16K bytes of video memory since the 
6845 chip used in the video design of the original PC had only 14 address pins (2!4 = 
16K). This 16K bytes of memory can hold up to 4 pages of data, where each page repre- 
sents one full screen of 80 x 25 characters. In graphics mode, the number of colors sup- 
ported varies depending on the resolution, as shown next. 


320 * 200 (Medium resolution) 


In this mode there are a total of 64,000 pixels (320 columns x 200 rows = 64,000). 
Dividing the total video RAM memory of 128K bits (16K x 8 bits = 128K bits) by the 
64,000 pots gives 2 bits for the color of each pixel. These 2 bits give rise to 4 colors 
since 2“ = 4. Therefore, the 320 x 200 resolution CGA can support only up to 4 different 
colors at a time. See Figure 16-8. These 4 colors can be selected from a palette of 16 pos- 
sible colors. To select this mode, use set mode option AH = 0 of INT 10H with AL = 04 
for mode. After setting the video mode to AL = 04, we must use option OBH of INT 10H 
to select the color of the pixel displayed on the screen. 


640 x 200 (High resolution) 


In this mode there are a total of 128,000 pixels (200 x 640 = 128,000). Dividing 
the 16K bytes of memory by this gives 1 bit (128,000/128,000 = 1) for color. The bit can 
be on (white) or off (black). Therefore, the 640 x 200 high-resolution CGA can support 


SE ee 
440 


640x200 2-color 320x200 4-color 


76543 210 76543 210 


OT111010 O01 1°01 1 


allah ale] 
O00000 


Figure 16-8. CGA Pixel Mapping 


only two colors: black and white. To select this mode, use set mode option AH = 0 of 
INT 10H with AL = 06 for mode. 

Note that for a fixed amount of video RAM, as the resolution increases, the num- 
ber of supported colors decreases. The 16-color 160 x 100 low resolution is used with 
color TV sets. 


The case of EGA 


In the EGA board, the memory buffer was increased to a maximum of 256K 
bytes. This allowed both the number of colors and the number of pixels supported in 
graphics mode to increase. Although EGA can have up to 64 colors, only 16 of them can 
be displayed on the screen at a time. This is in contrast to CGA, which displayed only 4 
colors of a 16-color palette. EGA graphics memory starts at A0000H and goes to a maxi- 
mum of AFFFFH, using only 64K bytes of the PC's memory space. How is this 256K 
bytes of memory accessed through a 64K-byte address window? To solve this problem, 
IBM designers used four parallel planes, each 64K bytes, to access the entire 256K bytes 
of video RAM. In this scheme, each plane holds one bit of the 4-bit color. The assignment 
of 4 bits for color allows a maximum of 16 colors to be displayed at any given time. In 
the EGA card, IBM introduced what are called palette registers. There are a total of 16 
palette registers in the EGA, each holding 8 bits. EGA uses only 6 bits out of the 8 bits of 
the palette register, giving rise to a maximum of 64 hues. 


Video memory size and color relation for EGA 


In EGA, to support 640 x 350 pixels with 16 colors requires a minimum of 640 x 
350 x 4 = 896,000 bits of memory, but because of the concept of the plane and the 64K- 
byte address space of AOOOOH—AFFFFH, the memory must be 256K bytes, although some 
portions of video memory are unused. 

In EGA, one can use 64K bytes for the video RAM, making only 16K bytes avail- 
able for each color plane, but this reduces the number of colors supported. EGA is down- 
wardly compatible with CGA in graphics mode, the same as in text mode. To program the 
palette registers of the EGA, use option AL = 0 of INT 10H. INT 10H has many options 
for pixel programming of EGA and VGA. 


The case of VGA 


In VGA, the number of pixels was increased to 640 x 480 with support for 256 
colors displayed at one time. The color palette was increased to 218 = 262,144 hues. The 
number of palette registers was also increased to 256. Each palette register holds 18 bits, 
6 bits for each of the red, green, and blue colors.VGA was the first analog monitor intro- 
duced by IBM. All previous monitors were digital. In analog VGA, the analog colors of 
red, green, and blue replace the digital red, green, and blue of the digital display, allow- 
ing substantial increases of the number of colors supported. This gives rise to the use of 
what is called a video DAC (digital-to-analog converter). Each color of red, green, and 
blue has a 6-bit D/A converter, allowing 64 combinations for each color, making a total of 
18 bits used for the palette, which gives rise to a total of 262,144 (2!8) hues. If the video 
DAC size is expanded from 6 to 8, the number of combinations for the three signals will 
be 256 x 256 x 256 = 274 = 16,777,216 hues for the color palette, which is referred to as 
16.7 million colors in many advertisements. 

ee 
CHAPTER 16: VIDEO AND VIDEO ADAPTERS 441 


Oyo Oo 1 i © Parallel memory planes 
(a ea fe m 
TORO UO) 1G 11 


Pixel values 


Figure 16-9. 16-Color EGA and VGA Mode Pixel Mapping 
Video memory size and color relation for VGA 


In VGA, 640 x 480 resolution with support for 256 colors displayed at one time 
will require a minimum of 640 x 480 x 8 = 2,457,600 bits of memory, but due to the 
architectural design of VGA, there must be 256K bytes of memory available on the video 
board. Using the concept of planes means that each plane has 64K bytes. See 
Figure 16-9. VGA is downward compatible with both CGA and EGA in graphics mode. 
To access one of the 256 palette registers of VGA, set AH = 10H and use option AL = 10H 
of INT 10H. As mentioned earlier, for the AH = 10H mode there are many options avail- 
able for pixel programming of both EGA and VGA. These options are selected through 
register AL. 


The case of SVGA 


In SVGA, all the resolutions of 800 x 600, 1024 x 768, and 1024 x 1024 are sup- 
ported. The memory requirement for these boards can reach millions of bytes, depending 
on the number of colors supported. For example, SVGA of 800 x 600 pixels with 256 col- 
ors displayed at the same time requires a minimum of 800 x 600 x 8 = 3,840,000 bits of 
memory, or a total of 480,000 bytes. Due to the use of bit planes, a total of 512K bytes of 
DRAM is needed. See Table 16-4. Another example is the total memory required by 800 
x 600 resolution with 16 million colors. In this case we need 800 x 600 x 24 = 11,520,000 
bits or 1,440,000 bytes, or 1406K bytes. Due to the use of bit planes, it uses 1.5M bytes 
of DRAM (see Table 16-4). 


INT 10H and pixel programming 


To address a single pixel on the screen, use INT 10H with AH = OCH. To do that, 
X and Y coordinates of the pixel must be known. The values for X (column) and Y (row) 
vary depending on the resolution of the monitor. CX holds the column point (the X coor- 
dinate) and DX the row point (Y coordinate). If the display mode supports more than one 
page, then BH = page number; otherwise, it is ignored. To turn the pixel on or off, AL = 
1 or AL = 0 for black and white. The value of AL can be modified for various colors. 


Drawing horizontal or vertical lines in graphics mode 


To draw a horizontal line, choose values for the row and column points at the 
beginning of the line and then continue to increment the column until it reaches the end 
of the line as shown in Example 16-13. 


442 


Table 16-4: Video Memory Requirements by Resolution 


hagi 16 Colors 256 Colors | 65,536 Colors | 16,777,216 Colors 
Resolution (4 bits) (8 bits) (16 bits) (24 bits) 


640 x 480 256K 512K 
800 x 600 256K 512K 
) 4 
4M 


M 

1024 x 768 512K 
Ce l e 
Example 16-13 


Using INT 10H, write a program to: 

(a) clear the screen 

(b) set the mode to CGA of 640 x 200 resolution 

(c) draw a horizontal line starting at column 50 and row 50 and ending at column 200, row 
50 


1600 x 1200 


Solution: 
;clear the screen 

MOV AX, 0600H 
MOV ial, Oy 


MOV CX, 0 
MOV DX, 184F 
INT 10H 


¿set the mode to 06 (CGA high resolution) 
MOV AH, 00 
MOV AL, 06 


INT 10H 
;draw the horizontal line from (50,50) to (200,50) 
MOV CX I0 ;col pixel=50 
MOV DX, 20 ;row pixel=50 
BACK: MOV AH, OC ;OCH option to write a pixel 
MOV AL, O1 turm on the prxel 
INT 10H 
INC EZ ;increment horizontal position 
CMP (KAO) ;check for the last posi- 


JNZ BACK Pale Dot Continue 


Review Questions 


1. The 320 x 200 resolution CGA can support _ colors. 

2. True or false. In 640 x 200 resolution, the pixel color can be black or white. 

3. As the number of pixels goes up, there is (more, less) video memory for 
storage of color bits. 

4. Ifa total of 24 bits is set aside for color, how many colors are available? 

5. Calculate the total video memory needed for 1024 x 768 resolution with 16 colors 
displayed at the same time. 


a! 
CHAPTER 16: VIDEO AND VIDEO ADAPTERS 443 


PROBLEMS 


SECTION 16.1: PRINCIPLES OF MONITORS AND VIDEO MODES 


1. Calculate the number of scan lines and dots per line for each of the following. 
(a) 640 x 200 EGA, DF = 14.318 MHz, HF = 15.75 kHz, and VF = 60 Hz 
(b) 640 x 350 EGA, DF = 16.257 MHz, HF = 21.85 kHz, and VF = 60 Hz 
(c) 640 x 480 VGA, DF = 25.175 MHz, HF = 31.5 kHz, and VF = 60 Hz 
2. In Problem 1, find the number of scan lines and pixels used for overscan and retrace. 
3. The following table is from PC Magazine, July 1993, showing the recommended dot 
pitch for various resolutions and monitor size. Calculate the diagonal size used by the 
image on screen for each case. 


Monitor size 640 x 480 800 x 600 1,024 x 768 1,280 x 1,024 
14 0.35 0.28 0.22 0.18 
NS 0.38 0.30 0.24 0.19 
ie 0.43 0.34 0.27 0.22 
20" 0.50 0.40 0.31 0.25 


4. A person wants to use a 14-inch monitor of 0.5 mm dot pitch for 640 x 480 VGA color 

resolution applications. Show by mathematics why he (she) cannot do that. ` 

Dot pitch refers to the size of the (pixel, distance between pixels). 

True or false. In color monitors each pixel has its own unique color. 

For 640 x 200 resolution used for the 80 x 25 characters per screen, find the size of 

the character box. 

8. A320 x 200 resolution used for the 8 x 8 character box allows only char- 
acters per screen. 


FENA 


SECTION 16.2: TEXT MODE PROGRAMMING AND VIDEO RAM 


9. True or false. VGA supports all previous video standards. 

10. Give the starting memory address used by VGA emulating MDA. 

11. Rewrite Example 16-7, to display letter "B" in 80 locations with the attributes of blue 
on red. 

12. Rewrite Example 16-8 for white on black blinking. 

13. Draw the character boxes for the letter B and digits 2 and 9, both in CGA and VGA. 


SECTION 16.3: GRAPHICS AND GRAPHICS PROGRAMMING 


14. True or false. The more color a given video board supports, the more DRAM it needs. 

15. True or false. The more pixels a given video board supports, the more DRAM it needs. 

16. The 16-bit color depth can support how many color hues? 

17. Why can 620 x 200 CGA support only black and white? 

18. In a given VGA board with 256K bytes of memory, how can it fit into space 
A0000—AFFFFH? 

19. To support 16,777,216 colors, the number of bits set aside for color depth must be 


20. Verify the memory requirements of the video board resolution and color depth of 
Table 16-9 for the following cases. 
(a) 640 x 480 of 16,256, 65,536 and 16 million colors 
(b) 1024 x 768 of 16,256, 65,536 and 16 million colors 
(c) 1600 x 1200 of 256 and 65,536 colors 

21. Write a program to change the video mode to 620 x 200 CGA high-resolution graph- 
ics and then draw two vertical lines splitting the screen into three equal sections. 


444 


ANSWERS TO REVIEW QUESTIONS 


SECTION 16.1: PRINCIPLES OF MONITORS AND VIDEO MODES 


Raster scan 
Horizontal 

True 

(a) DF (b) VF 
False 

True 

True 

True 


0 I ee 


SECTION 16.2: TEXT MODE PROGRAMMING AND VIDEO RAM 


(a) 00H = 0000 0000B is black on black. (b) 14H = 0001 0100 is red on blue. 
Foreground 

B8000H 

B0000H 

True 

False 


DAARWN > 


SECTION 16.3: GRAPHICS AND GRAPHICS PROGRAMMING 


4 

True 

Less 

16.7 million 

1024 x 768 x 4 = 3,145,728 bits = 384K bytes, but it uses 512 KB due to bit planes. 


ne = 


eee ——————————————————$ 


CHAPTER 16: VIDEO AND VIDEO ADAPTERS 445 


446 


CHAPTER 17 


SERIAL PORT PROGRAMMING 
WITH ASSEMBLY AND C# 


OBJECTIVES 
Upon completion of this chapter, you will be able to: 


>> List the advantages of serial communication over parallel communication 

>> Explain the difference between synchronous and asynchronous 
communication 

>> Define the terms simplex, half duplex, and full duplex and diagram their 
implementation in serial communication 

>> Describe how start and stop bits frame data for serial communication 

>> Contrast and compare the measures baud rate and bps (bits per second) 

>> Describe the RS232 standard 

>> Contrast and compare DTE (data terminal) versus DCE (data 
communication) equipment 

>> Describe the purpose of handshaking signals such as DTR, RTS, and CTS 

>> Code Assembly language instructions to perform serial communication 
using BIOS INT 14H 

>> Describe the function of UART and USART chips 


447 


Computers transfer data in two ways: parallel and serial. In parallel data transfers, 
often eight or more lines (wire conductors) are used to transfer data to a device that is only 
a few feet away. Examples of parallel transfers are printers and hard disks using cables 
with many wire strips. Although in such cases a lot of data can be transferred in a short 
amount of time by using many wires in parallel, the distance cannot be great. To transfer 
data to a device located many meters away, the serial method is used. In serial communi- 
cation, the data is sent one bit at a time, in contrast to parallel communication, in which 
the data is sent a byte or more at a time. Serial communication and the study of associat- 
ed chips are the topics of this chapter. 


SECTION 17.1: BASICS OF SERIAL COMMUNICATION 


When a microprocessor communicates with the outside world it provides the data 
in byte-sized chunks. In some cases, such as printers, the information is simply grabbed 
from the 8-bit data bus and presented to the 8-bit data bus of the printer. This can work if 
the cable is not too long since long cables diminish and even distort signals. In addition, 
an 8-bit data path is expensive. For these reasons, serial communication is used for trans- 
ferring data between two systems located at distances of hundreds of feet to millions of 
miles apart. Figure 17-1 diagrams serial versus parallel data transfers. 


Serial Transfer Parallel Transfer 


DO 


Figure 17-1. Serial versus Parallel Data Transfer 


The fact that in serial communication a single data line is used instead of the 8- 
bit data line of parallel communication not only makes it much cheaper but also makes it 
possible for two computers located in two different cities to communicate over the tele- 
phone. 

For serial data communication to work, the byte of data must be grabbed from the 
8-bit data bus of the microprocessor and converted to serial bits using a parallel-in-serial- 
out shift register; then it can be transmitted over a single data line. This also means that at 
the receiving end there must be a serial-in-parallel-out shift register to receive the serial 
data, pack it into a byte, and present it to the system at the receiving end. Of course, if data 
is to be transferred on the telephone line, it must be converted from Os and 1s to audio 
tones, which are sinosoidal-shaped signals. This conversion is performed by a peripheral 
device called a modem, which stands for "modulator/demodulator." 

When the distance is short, the digital signal can be transferred as it is on a sim- 
ple wire and requires no modulation. This is how IBM PC keyboards transfer data 
between the keyboard and the motherboard. However, for long-distance data transfers 
using communication lines such as a telephone, serial data communication requires a 
modem to modulate (convert from 0s and 1s to audio tones) and demodulate (convert from 
audio tones to Os and 1s). 

Serial data communication uses two methods, asynchronous and synchronous. 


eee 
448 


The synchronous method transfers a block of data (characters) at a time while the asyn- 
chronous transfers a single byte at a time. 

It is possible to write software to use either of these methods, but the programs 
can be tedious and long. For this reason, special IC chips are made by many manufactur- 
ers for serial data communications. These chips are commonly referred to as UART (uni- 
versal asynchronous receiver-transmitter) and USART (universal synchronous-asynchro- 
nous receiver-transmitter). The COM port in the x86 PC uses the UART. 


Half- and full-duplex transmission 


In data transmission a duplex transmission is one in which the data can be trans- 
mitted and received. This is in contrast to simplex transmissions such as printers, in which 
the computer only sends data. Duplex transmissions can be half or full duplex, depending 
on whether or not the data transfer can be simultaneous. If data is transmitted one way at 
a time, it is referred to as half duplex. If the data can go both ways at the same time, it is 
full duplex. Of course, full duplex requires two wire conductors for the data lines (in addi- 
tion to ground), one for transmission and one for reception, in order to transfer and receive 
data simultaneously. See Figure 17-2. 


Hal Duplex Le Om] 
= 


Figure 17-2. Simplex, Half-, and Full-Duplex Transfers 


Asynchronous serial communication and data framing 


The data coming in at the receiving end of the data line in a serial data transfer is 
all Os and 1s; it is difficult to make sense of the data unless the sender and receiver agree 
on a set of rules, a protocol, on how the data is packed, how many bits constitute a char- 
acter, and when the data begins and ends. 


Start and stop bits 


Asynchronous serial data communication is widely used for character-oriented 
transmissions, and block-oriented data transfers use the synchronous method. In the asyn- 
chronous method, each character is put between start and stop bits. This is called framing. 
In data framing for asynchronous communications, the data, such as ASCII characters, are 
packed between a start bit and a stop bit. The start bit is always one bit but the stop bit 
can be one or two bits. The start bit is always a 0 (low) and the stop bit(s) is 1 (high). For 
example, look at Figure 17-3 where the ASCH character "A", binary 0100 0001, is framed 
between the start bit and 2 stop bits. Notice that the LSB is sent out first. 


E 
CHAPTER 17: SERIAL PORT PROGRAMMING WITH ASSEMBLY AND C# 449 


In Figure 17-3, when there is no transfer the signal is 1 (high), which is referred 
to as mark. The 0 (low) is referred to as space. Notice that the transmission begins with a 
start bit followed by DO, the LSB, then the rest of the bits until the MSB (D7), and final- 
ly, the 2 stop bits indicating the end of the character "A". 


goes out last goes out first 


Figure 17-3. Framing ASCII “A” (41H) 


In asynchronous serial communications, peripheral chips and modems can be pro- 
grammed for data that is 5, 6, 7, or 8 bits wide. This in addition to the number of stop bits, 
1 or 2. While in older systems ASCH characters were 7-bit, due to extended ASCII char- 
acters, 8 bits are required for each character. Small non-ASCII keyboards use 5- and 6-bit 
characters. In some older systems, due to the slowness of the receiving mechanical device, 
2 stop bits were used to give the device sufficient time to organize itself before transmis- 
sion of the next byte. However, in modern PCs the use of 1 stop bit is common. Assuming 
that we are transferring a text file of ASCII characters using 2 stop bits, we have a total of 
11 bits for each character since 8 bits are for the ASCII code, and 1 and 2 bits are for start 
and stop bits, respectively. Therefore, for each 8-bit character there are an extra 3 bits, or 
more than 30% overhead. 

In some systems in order to maintain data integrity, the parity bit of the character 
byte is included in the data frame. This means that for each character (7- or 8-bit, depend- 
ing on the system) we have a single parity bit in addition to start and stop bits. The pari- 
ty bit is odd or even. In the case of an odd-parity bit the number of data bits, including the 
parity bit, has an odd number of 1s. Similarly, in an even-parity bit the total number of 
bits, including the parity bit, is even. For example, the ASCII character "A", binary 0100 
0001, has 0 for the even-parity bit. UART chips allow programming of the parity bit for 
odd-, even-, and no-parity options, as we will see in the next section. If a system requires 
the parity, the parity bit is transmitted after the MSB, and is followed by the stop bit. 


Data transfer rate 


The rate of data transfer in serial data communication is stated in bps (bits per sec- 
ond). Another widely used terminology for bps is baud rate. However, the baud and bps 
rates are not necessarily equal. This is due to the fact that baud rate is the modem termi- 
nology and is defined as number of signal changes per second. In modems, there are occa- 
sions when a single change of signal transfers several bits of data. As far as the conductor 
wire is concerned, the baud rate and bps are the same, and for this reason in this book we 
use the terms bps and baud interchangeably. 

The data transfer rate of a given computer system depends on communication 
ports incorporated into that system. For example, the early IBM PC could transfer data at 
the rate of 100 to 9600 bps. However, today’s x86 PC can transfer data at rates as high as 
115,200 bps. It must be noted that in asynchronous serial data communication, the baud 
rate is generally limited to less than 100,000 bps when using a telephone line and modem. 
See Examples 17-1, 17-2, and 17-3. 


RS232 and other serial I/O standards 


To allow compatibility among data communication equipment made by various 
manufacturers, an interfacing standard called RS232 was set by the Electronics Industries 
Association (EIA) in 1960. In 1963 it was modified and called RS232A. RS232B and 
RS232C were issued in 1965 and 1969, respectively. In this book we refer to it simply as 


aa aaa 
450 


Example 17-1 


Calculate the total number of bits used in transferring 50 pages, each with 80 x 25 char- 
acters. Assume 8 bits per character and 1 stop bit. 


Solution: 


For each character a total of 10 bits is used, 8 bits for the character, 1 stop bit, and 1 start 
bit. Therefore, the total number of bits is 80 x 25 x 10 = 20,000 bits per page. For 50 
pages, 1,000,000 bits will be transferred. 


Example 17-2 


Calculate the time it takes to transfer the entire 50 pages of data in Example 17-1 using 
a baud rate of: 
(a) 9600 (b) 57,600 


Solution: 


(a) 1,000,000 / 9600 = 104 seconds 
(b) 1,000,000 / 57,600 = 17 seconds 


Example 17-3 


Calculate the time it takes to download a movie of 2 gigabytes using a telephone line. 
Assume 8 bits, 1 stop bit, no parity, and 57,600 baud rate. 


Solution: 


2 x 1,000,000,000 x 10 / 57,600 = 347,222 seconds = 4 days 


RS232. Today, RS232 is the most widely used serial I/O interfacing standard. However, 
since the standard was set long before the advent of the TTL logic family, the input and 
output voltage levels are not TTL compatible. In the RS232 a 1 is represented by —3 to 
—25 V, while the 0 bit is +3 to +25 V, making —3 to +3 undefined. For this reason, to con- 
nect any RS232 to a TTL-level chip (microprocessor or UART) we must use voltage con- 
verters such as MAX232 or MAX233 to convert the TTL logic levels to the RS232 volt- 
age level, and vice versa. MAX232 and MAX233 IC chips are commonly referred to as 
line drivers. This is shown in Figures 17-4 and 17-5. The MAX232 has two sets of line 
drivers for transferring and receiving data, as shown in Figure 17-4. The line drivers used 
for TxD are called T1 and T2, while the line drivers for RxD are designated as R1 and R2. 
In many applications only one of each is used. Notice in MAX232 that the T1 line driver 
has a designation of Tlin and Tlout on pin numbers 11 and 14, respectively. The Tlin pin 
is the TTL side and is connected to TxD of the USART, while Tlout is the RS232 side 
that is connected to the RxD pin of the RS232 DB connector. The R1 line driver has a des- 
ignation of Rlin and Rlout on pin numbers 13 and 12, respectively. The R lin (pin 13) is 
the RS232 side that is connected to the TxD pin of the RS232 DB connector, and Rlout 
(pin 12) is the TTL side that is connected to the RxD pin of the USART. See Figure 17-4. 
Notice the null modem connection where RxD for one is TxD for the other. 

MAX232 requires four capacitors ranging from | to 22 mF. The most widely used 
value for these capacitors is 22 mF. To save board space, some designers use the MAX233 
chip from Maxim. The MAX233 performs the same job as the MAX232 but eliminates 
the need for capacitors. However, the MAX233 chip is much more expensive than the 
MAX232. See Figure 17-5 for MAX233 with no capacitor used. 

PO 


CHAPTER 17: SERIAL PORT PROGRAMMING WITH ASSEMBLY AND C# 451 


USART (TTL) 


USART (TTL) 
MAX233 


Figure 17-5. Inside MAX233 and Its Connection to the USART 
RS232 pins 


Table 17-1 provides the pins and their labels for the RS232 cable, commonly 
referred to as the DB-9 connector. The x86 PC 9-pin serial port is shown in Figure 17-6. 


Data communication classification 


Current terminology classifies data communication equipment as DTE (data ter- 
minal equipment) or DCE (data communication equipment). DTE refers to terminals and 
computers that send and receive data, while DCE refers to communication equipment, 
such as modems, that is responsible for transferring the data. Notice that all the RS232 pin 
function definitions of Table 17-1 are from the DTE point of view. 

The simplest connection between two PCs (DTE and DTE) requires a minimum 
of three pins, TxD, RxD, and ground, as shown in Figure 17-7. Notice that the connection 
between two DTE devices, such as two PCs, requires pins 2 and 3 to be interchanged as 
shown in Figure 17-7. In looking at Figure 17-7, keep in mind that the RS232 signal def- 
initions are from the point of view of DTE. 


eee 
452 


Examining the RS232 handshaking 
signals 


To ensure fast and reliable data 
transmission between two devices, the 
data transfer must be coordinated. Due to 
the fact that in serial data communication 
the receiving device may have no room for 
the data there must be a way to inform the 
sender to stop sending data. 

Some of the pins of the RS-232 
are used for handshaking signals. They are 
described below. 


1. DTR (data terminal ready). When 
the terminal (or a PC COM port) is 
turned on, after going through a self- 
test, it sends out signal DTR to indi- 
cate that it is ready for communica- 
tion. If there is something wrong 
with the COM port, this signal will 
not be activated. This is an active- 
low signal and can be used to inform 
the modem that the computer is alive 
and kicking. This is an output pin 
from DTE (PC COM port) and an 
input to the modem. 

2. DSR (data set ready). When a DCE 
(modem) is turned on and has gone 
through the self-test, it asserts DSR 
to indicate that it is ready to commu- 
nicate. Therefore, it is an output 
from the modem (DCE) and an input 
to the PC (DTE). This is an active- 
low signal. If for any reason the 
modem cannot make a connection to 
the telephone, this signal remains 
inactive, indicating to the PC (or ter- 
minal) that it cannot accept or send 
data. 

3. RTS (request to send). When the 


Figure 17-6. DB-9 9-Pin Connector 


Table 17-1: IBM PC DB-9 Signals 


Pin Description 

li Data carrier detect (DCD) 
2 Received data (RxD) 

3 Transmitted data (TxD) 

4 Data terminal ready (DTR) 
5 Signal ground (GND) 


6 Data set read R 
7 Request to send 


8 Clear to send (CTS) 


9 Ring indicator (RI 


DTE (x86 PC) 


DTE (x86 PC) 


Figure 17-7. Null Modem Connection 
For Data Lines 


DTE device (such as a PC) has a byte to transmit, it asserts RTS to signal the 
modem that it has a byte of data to transmit. RTS is an active-low output from the 


DTE and an input to the modem. 


4. CTS (clear to send). In response to RTS, when the modem has room for storing the 
data it is to receive, it sends out signal CTS to the DTE (PC) to indicate that it can 
receive the data now. This input signal to the DTE is used by the DTE to start 
transmission. 

5. CD (carrier detect, or DCD, data carrier detect). The modem asserts signal CDC to 
inform the DTE (PC) that a valid carrier has been detected and that contact 
between it and the other modem is established. Therefore, CDC is an output from 
the modem and an input to the PC (DTE). 

6. RI (ring indicator). An output from the modem (DCE) and an input to a PC (DTE) 


CHAPTER 17: SERIAL PORT PROGRAMMING WITH ASSEMBLY AND C# 453 


indicates that the telephone is ringing. It goes on and off in synchronization with 
the ringing sound. Of the six handshake signals, this is the least often used, due to 
the fact that modems take care of answering the phone. However, if in a given sys- 
tem the PC is in charge of answering the phone, this signal can be used. 


From the above description, PC and modem communication can be summarized 
as follows: While signals DTR and DSR are used by the PC and modem, respectively, to 
indicate that they are alive and well, it is RTS and CTS that actually control the flow of 
data. When the PC wants to send 
data it asserts RTS, and in response, 
if the modem is ready (has room) to 
accept the data, it sends back CTS. 
If, for lack of room, the modem does 
not activate CTS, the PC will 
deassert DTR and try again. RTS and 
CTS are also referred to as hardware 
control flow signals. See Figure 
17-8. 


This concludes the descrip- 
tion of the most important pins of the 
RS232 handshake signals plus TxD, 
RxD, and ground. Ground is also 
referred to as SG (signal ground). In 
the next section we will see serial 
communication programming in the 
x86 PC. 


Figure 17-8. Null Modem Connection with 
Control Signals 


Review Questions 


1. The transfer of data using parallel lines is (faster, slower) but 
(more expensive, less expensive). 
2. In communications between two PCs in New York and Dallas, we use 
(serial, parallel) data communication. 


3. In serial data communication, which method fits block-oriented data? 

4. True or false. Sending data to a printer is duplex. 

5. True or false. In duplex we must have two data lines. 

6. The start and stop bits are used in the (synchronous, asynchronous) 
method. 


7. Assuming that we are transmitting letter "D", binary 100 0100, with odd-parity bit 
and 2 stop bits, show the sequence of bits transferred. 

8. In Question 7, find the overhead due to framing. 

9. Calculate the time it takes to transfer 400 characters as in Question 7 if we use 
1200 bps.What percentage of time is wasted due to overhead? 

10. True or false. RS232 is not TTL-compatible. 


454 


SECTION 17.2: PROGRAMMING x86 PC COM PORTS 
USING ASSEMBLY AND C# 


To relieve users and programmers from the tedious details of the USART chip, 
both Windows and BIOS provide means of accessing the x86 PC serial COM ports. In 
this section we show how to use Assembly language and Visual C++ to program the COM 
ports. First some introductory comments on the number of ports in the PC will be given. 


IBM PC COM ports 


In the x86 PC, as many as four COM ports can be installed. They are numbered 
1, 2,3, and 4 (BIOS numbers them as 0, 1, 2, and 3). When the PC is turned on, it is the 
job of the POST (power-on self-test) to test the USART chip for each of the four COM 
ports. If they are installed, their I/O port addresses are written to memory locations 
0040:0000-0040:0007. Since the I/O address assigned to each UART is a 16-bit address, 
it takes 2 bytes for each installed UART. The BIOS data area memory locations 0040:0000 
and 0040:00001 will have the I/O port address for COM 0, and the 0040:0002 and 
0040:0003 locations have the I/O port address for COM 1, and so on. We can also use the 
System Tools option in the Accessories menu in Windows to see the COM ports installed 
in the x86 PC. See Example 17-4. 


Example 17-4 


A nationally known computer columnist is asked by a reader how he/she can find the 
number of COM ports installed in a PC and which one is installed. What do you think 
the answer should be? 


Solution: 


Dumping memory locations 0040:0000—0040:0007 in DEBUG on the computer, as 
shown below, showed that there is only one COM port with starting address 03F8H. 


C:\>DEBUG 
-d 0040:0000 Tog 
0940;0000 Fe 03 00 00 00 00 00 00 


i inal 8&8 PC 
Using HyperTerminal on x Table 17-2: Some 


HyperTerminal is a widely used utility that comes with HyperTerminal 
Windows 2000/XP. It allows communication with the x86 PC RARLRAS 
via the serial COM port. Some of the baud rates suppoerted by 
HyperTerminal shown in Table 17-2. The only problem with 300 
HyperTerminal is that it is not dynamic, which means we cannot 600 
incorporate it into our program. To do that, we can use INT 14H 1,200 
in Assembly language or the COM port statement in C#. 2,400 
Programming COM ports using BIOS INT 14H with 4,800 
Assembly 9,600 

The serial communication ports of the x86 PC can be 19,200 
accessed using the BIOS-based INT 14H. Various options of 38,400 
INT 14H are chosen with the AH registers as shown in Figure 57,600 
17-9. Using BIOS INT 14H we can send and receive characters 115.200 


with another PC via a COM port. The process is as follows. 


1. To send a character we use INT 14H, AH = 1, AL = character. 


a 
CHAPTER 17: SERIAL PORT PROGRAMMING WITH ASSEMBLY AND C# 455 


2. To receive a character we use INT 14H, AH = 3 to get the COM port's status in 
register AH. Notice that this is the status of the COM port and not the status of the 
MODEM, which is given in AL. Then check DO of the status port, which is called 
received data ready. If it is high, a character has been received via the COM port 
and is sitting inside the USART. To read the received character we use INT 14H, 
AH = 2 where AL holds the character upon return. 


INT 14H Function 


Initialize COM Port 


Additional Call Registers Result Registers 

AL = parameter (see below) AH = port status (see below) 

DX = port number (0 if COM1, AL = modem status (see below) 
1 if COM2, etc.) 


Note 1: The parameter byte in AL is defined as follows: 


76543210 Indicates 
KEEK Baud rate (000 = 110, 001 = 150, 
010 = 300, 011 = 600, 100 = 1200, 
101 = 2400, 110 = 4800, 111 = 9600) 
Parity (01 = odd, 11 = even, x0 = none) 
Stop bits (0 = 1, 1 = 2) 
XX Word length (10 = 7 bits, 11 = 8 bits) 


Note 2: The port status returned in AH is defined as follows: 


76543210 Indicates 
Timed-out 
Transmit shift register empty 
Transmit holding register empty 
Break detected 
Framing error detected 
Parity error detected 
Overrun error detected 
Received data ready 


Note 3: The modem status returned in AL is defined as follows: 


76543210 Indicates 
Received line signal detect 
Ring indicator 
DSR (data set ready) 
CTS (clear to send) 
Change in receive line signal detect 
Trailing edge ring indicator 
Change in DSR status 
Change in CTS status 


Figure 17-9. BIOS INT 14H Functions (continued on the following page) 


aaae 
456 


01 Write Character to COM Port 


Additional Call Registers Result Registers 

AL = character AH bit 7 = 0 if successful, 1 if not 

DX = port number (0 if COM1, AH bits 0-6 = status if successful 
1 if COM2, etc.) AL = character 


Note: The status byte in AH, bits 0—6, after the call is as follows: 


6543210 Indicates 
Transmit shift register empty 
Transmit holding register empty 
Break detected 
Framing error detected 
Parity error detected 
Overrun error detected 
Receive data ready 


Read Character from COM Port 


Additional Call Registers Result Registers 
DX = port number (0 if COM1, AH bit 7 = 0 if successful, 1 if not 
1 if COM2, etc.) AH bits 0—6 = status if successful 
AL = character read 


Note: The status byte in AH, bits 1—4, after the call is as follows: 


4321 Indicates 
Break detected 
Framing error detected 
Parity error detected 
Overrun error detected 


Read COM Port Status 


Additional Call Registers Result Registers 
DX = port number (0 if COM1, AH = port status 
1 if COM2, etc.) AL = modem status 


Note: The port status and modem status returned in AH and AL are the same 
format as in INT 14H function 00H, described above. 


Program 17-1 shows the steps needed to read from and write to the COM port: 


(1) Check for a key press. If a key has been pressed, get it and write it to the COM port 
to be transferred. Also check for ESC to exit. 

(2) If there is no key pressed, go check the status of the COM port. If a character has been 
received, read it and display it on the screen. 

(3) Go to step (1). 


Figure 17-9. BIOS INT 14H Functions (continued from the previous page) 


pO eee eee —————————————ooo——eeEeEEEEEE= 
CHAPTER 17: SERIAL PORT PROGRAMMING WITH ASSEMBLY AND C# 457 


TITLE SERIAL DATA COMMUNICATION BETWEEN TWO PCS 
-MODEL SMALL 
. STACK 
. DATA 
DB ‘Serial communication viamcomMmiT 4300 
DB 'No P,1 Stop,8-BIT DATA. ',0AH,0DH' 
DB "ANY KEY @PRESS Seon © @ Wir R ee Gree ete ONDE 
BDB PRESSES Cal Ome el SA 
. CODE 
MAIN PROC 
MOV AX, @DATA 
MOV DS,AX 
MOV AH, 09 
MOV DX,OFFSET MESSAGE 
INT 21H 
PTning (OME Ik 
MOV AH, 0 ;initialize COM port 
MOV DX,0 ;COM 1 
MOV AL, 0C3H VAS OO, NOE, Wolo ASER i DATA 
INT 14H 
;checking key press and sending key to COM2 to be transfered 
AGAIN: MOV AH,01 ;check for key press using INT 16H ,AH=01 
INT 16H ;if ZF=1, there is no key press 
JZ NEXT ;If no key go check COM port 
MOV AH, 0 ryes, there is a key press, get it 
INT 16H ;notice we must use INT 16H twice,2nd time 
swith AH=0 to get the char itself. AL=ASCII char pressed 
CMP AL, 1BH zis it esc key? 
JE EXIT POS TIME 
MOV AH,1 sno. send the char to COM 1 port 
MOV DX,0 
INT 14H 
;check COMI port for a char. if so get it and display it 
NEXE: MOV AH, 03 ;get COM 2 status 
MOV DXx,0 
INT 14H 
AND AH,Ol1 ;AH has COM port status, mask all but DO 
CMP AH,O1 check DO to see if there is a char 
JNE AGAIN ;no data, go to monitor keyboard 
MOV AH, 02 ryes, COM1 has data: get it 
MOV DX,0 
INT 14H ;get it 
MOV DL,AL sand display it using INT 21H 
MOV AH, 02 ;DL has char to be displayed 
ENE 2TH 
JMP AGAIN ;keep monitoring keyboard 
MOV AH, 4CH exit ey DOS 
ENTEH 
ENDP 
END 


Program 17-1: Serial Communication Between Two PCs 
Programming x86 COM port using C# 2005 


To program the COM port we can use System.IO.Ports namespace in C# 2005. 
This is shown in Programs 17-2 and 17-3. Program 17-2 writes data to the COM port and 
Program 17-3 reads data from the COM port. 


—_—_—_———_--—————————— 
458 


MOG Cam | /-—2 Send data through the COM port. 
//Must be compiled in Visual C# 2005 Express, 

// which is availabe for free from Microsoft website. 
using System; 

using System.IO.Ports; 


namespace SerialComm { 
class SerialOut { 
Static void Main () 
{ 
// The following line will set the COM port parameters in C#. 
SerialPort comi = new SerialPort ( "COM1", 9600, 
Parity.None, 8, StopBits.One ); 
coml.Open (); // Open the COM port 1. 
do 
{ 
// Send the data through the COM port. 
coml.WriteLine ( "Hello World!!!" ); 
} 
while (!Console.KeyAvailable); // Press any key to close. 
comi Close (); // Close the COM port 1. 


Program 17-2: Sending Data Through the COM Port 


/7/Program 17-3 
// the screen. 
//Must be compiled in Visual C# 2005 Express, 

// which is availabe for free from Microsoft website. 
using System; 

using System.I0O.Ports; 


Get data from COM port and display it on 


namespace Serialin { 
class Serialin { 
Statick Waoslel Miele’ () 


{ 
// The following line will set the COM port parameters in C#. 


Siaicialeome Coml = new Serialbort ( Vcowly, 60d, 
Parity.None, 8, StopBits.One ); 

coml.Open (); // Open the COM port 1. 

do 


{ 
// Read from COM port and display on screen. 


Console.WriteLine ( coml.ReadLine () ); 


} 


while (!Console.KeyAvailable); // Press any key to close. 
comi Close (); // Close the COM port 1. 


Program 17-3: Getting Data from the COM Port and Displaying on Screen 


ENE eee eee een 
CHAPTER 17: SERIAL PORT PROGRAMMING WITH ASSEMBLY AND C# 459 


Review Questions 


1. The maximum number of COM ports allowed in the PC is 

2. Give the minimum and maximum baud rates supported by Hype emand. 

3. Give the minimum and maximum baud rates supported by INT 14H using AH = 0 
and AH = 14. 


PROBLEMS 


SECTION 17.1: BASICS OF SERIAL COMMUNICATION 


. Which is more expensive, parallel or serial data transfer? 

2. True or false. 0- and 5-V digital pulses can be transferred on the telephone without 
being converted (modulated). 

3. Show the framing of the letter ASCII "Z" (0101 1010), even parity, 1 stop bit. 

4. Ifthere is no data transfer and the line is high, it is called (mark, space). 

5. What is space? 

6. Calculate the overhead percentage if the data size is 6 bits, 2 stop bits, even parity. 

7. True or false. The RS232 voltage specification is TTL compatible. 

8. What is the function of the MAX232 chip? 

9. True or false. The COM port on the back of an x86 PC uses an RS232 male connec- 
tor. 

10. How many pins of the RS232 are used by the null modem connection? 

11. True or false. The longer the cable, the higher the data transfer baud rate. 

12. The function definition of the RS232 pins is stated from the point of view of 

(DTE, DCE). 

13. If two PCs are connected through the RS232 without the modem, they are both con- 
figured as a (DTE, DCE) -to- (DTE, DCE) connection. 

14. State the most important signals of the RS232 used for the null modem connection. 

15. Calculate the total number of bits transferred if 200 pages of ASCII data are sent using 
asynchronous serial data transfer. Assume a data size of 8 bits, 1 stop bit, no parity. 


SECTION 17.2: PROGRAMMING x86 PC COM PORTS USING ASSEMBLY AND C# 


16. In the IBM PC, what is the maximum number of COM ports that can be installed? Use 
System Tools to find which COM port is installed on your PC. 

17. What are the highest baud rates supported by the x86 PC COM ports using BIOS pro- 
gramming? 

18. Show the COM? port setting of 9600 baud rate, 8 data bits, 1 stop bit, no parity bit, 
using BIOS INT 14H. 

19. Modify Program 17-1 to display the key pressed not only on the receiving PC's mon- 
itor but also on the monitor of the PC that sent it. 


ANSWERS TO REVIEW QUESTIONS 


SECTION 17.1: BASICS OF SERIAL COMMUNICATION 


1. Faster, more expensive 
2. Serial 

3. Synchronous 

4. False; it is simplex. 

5. True 

6. Asynchronous 


aaa 


7. With 100 0100 binary we have 1 as the odd-parity bit. The bits as transmitted in the 
sequence are: 


(a) 0 (start bit) 
(b) 0 


(i) 1 (parity) 

(j) 1 (first stop bit) 

(k) 1 (second stop bit) 
8. 4 bits 


9. 400 x 11 = 4400 bits total bits transmitted. 4400/1200 = 3.667 seconds, 4/7 = 58%. 
10. True 


SECTION 17.2: PROGRAMMING x86 PC COM PORTS USING ASSEMBLY AND C# 


l. 4 
2. 110 to 9600 (can be higher if we bypass BIOS) 
3. 110 to 19,200 (can be higher if we bypass BIOS) 


a 
CHAPTER 17: SERIAL PORT PROGRAMMING WITH ASSEMBLY AND C# 461 


462 


CHAPTER 18 


KEYBOARD AND PRINTER 


INTERFACING 


OBJECTIVES 


Upon completion of this chapter, you will be able to: 


>> 
>> 


>> 


>> 


>> 
>> 
>> 
>> 


>> 
>> 


>> 
>> 
>> 


Diagram how a keyboard matrix is connected to the I/O ports of a PC 
Describe the processes of key press detection and key identification 
performed by a microprocessor with pseudocode or flowchart 

Describe the respective functions of the keyboard microcontroller, 
INT 09, and the motherboard in keyboard input 

Code Assembly language instructions using INT 16H to get and check the 
keyboard input buffer and status bytes 

Diagram how data from the keyboard is stored in the keyboard buffering 
State the differences between hard contact and capacitance keyboards 
List the 36-pin assignments of the Centronics printer interface 

Describe the BIOS programming for the four parallel printer ports 
LPT1-LPT4 

Diagram the I/O port assignment for printer data, status, and control 
Code Assembly language instructions using INT 17H to check printer 
status, initialize printers, and send data to the printer 

Describe the printer time-out problem and how it can be alleviated 
Discuss the evolution of the PC's parallel port 

Contrast and compare parallel port types, including SPP, PS/2, EPP, and 
ECP 


463 


Along with video monitors, keyboards and printers are the most widely used 
input/output devices of the PC, and a basic understanding of them is essential. In this chap- 
ter, we discuss fundamentals of the keyboard and printer. 


SECTION 18.1: INTERFACING THE KEYBOARD TO THE 
CPU 


At the lowest level, keyboards are organized in a matrix of rows and columns. The 
CPU accesses both rows and columns through ports; therefore, with two 8-bit ports, an 8 
x 8 matrix of keys can be connected to a microprocessor. When a key is pressed, a row and 
a column make a contact; otherwise, there is no connection between rows and columns. In 
IBM PC keyboards, a single microcontroller (consisting of a microprocessor, RAM and 
EPROM, and several ports all on a single chip) takes care of hardware and software inter- 
facing of the keyboard. In such systems, it is the function of programs stored in the 
EPROM of the microcontroller to scan the keys continuously, identify which one has been 
activated, and present it to the main CPU on the motherboard. More details of the IBM 
PC keyboard design are presented in Section 18.2. In this section we look at the mecha- 
nism by which the microprocessor scans and identifies the key. For clarity we use 8088/86 
Assembly language instructions in examples. 


Scanning and identifying the key 


Figure 18-1 shows a 4 x 4 matrix connected to two ports. The rows are connected 
to an output port and the columns are connected to an input port. If no key has been 
pressed, reading the input port will yield 1s for all columns since they are all connected to 
high (VCC). If all the rows are grounded and a key is pressed, one of the columns will 


Figure 18-1. Matrix Keyboard Connection to Ports 


464 


have 0 since the key pressed provides the path to ground. It is the function of the micro- 
processor to scan the keyboard continuously to detect and identify the key pressed. How 
it is done is explained next. 


Grounding rows and reading the columns 


To detect the key pressed, the microprocessor grounds all rows by providing 0 to 
the output latch, then it reads the columns. If the data read from the columns is D3—D0 = 
1111, no key has been pressed and the process continues until a key press is detected. 
However, if one of the column bits has a zero, this means that a key press has occurred. 
For example, if D3—D0 = 1101, this means that a key in the D1 column has been pressed. 
After a key press is detected, the microprocessor will go through the process of identify- 
ing the key. Starting with the top row, the microprocessor grounds it by providing a low to 
row DO only; then it reads the columns. If the data read is all 1s, no key in that row is acti- 
vated and the process is moved to the next row. It grounds the next row, reads the columns, 
and checks for any zero. This process continues until the row is identified. After identifi- 
cation of the row in which the key has been pressed, the next task is to find out which col- 
umn the pressed key belongs to. This should be easy since the CPU knows at any time 
which row and column are being accessed. Look at Example 18-1. 


Example 18-1 


From Figure 18-1, identify the row and column of the pressed key for each of the following. 
(a) D3-D0 = 1110 for the row, D3—D0 = 1011 for the column 
(b) D3-D0 = 1101 for the row, D3—D0 = 0111 for the column 


Solution: 


From Figure 18-1 the row and column can be used to identify the key. 

(a) The row belongs to DO and the column belongs to D2; therefore, the key number 2 was 
pressed. 

(b) The row belongs to D1 and the column belongs to D3; therefore, the key number 7 was 
pressed. 


Program 18-1 is the Assembly language program for the detection and identifica- 
tion of the key activation. In this program, it is assumed that PORT_A and PORT_B are 
initialized as output and input, respectively. Program 18-1 goes through the following four 
major stages: 


1. To make sure that the preceding key has been released, Os are output to all rows at 
once, and the columns are read and checked repeatedly until all the columns are 
high. When all columns are found to be high, the program waits for a short amount 
of time before it goes to the next stage of waiting for a key to be pressed. 

2. To see if any key is pressed, the columns are scanned over and over in an infinite 
loop until one of them has a 0 on it. Remember that the output latches connected to 
rows still have their initial zeros (provided in stage 1), making them grounded. 
After the key press detection, it waits 20 ms for the bounce and then scans the 
columns again. This serves two functions: (a) it ensures that the first key press 
detection was not an erroneous one due to a spike noise, and (b) the 20 ms delay 
prevents the same key press from being interpreted as a multiple key press. If after 
the 20 ms delay the key is still pressed, it goes to the next stage to detect which row 
it belongs to; otherwise, it goes back into the loop to detect a real key press. 

3. To detect which row the key press belongs to, it grounds one row at a time, reading 
the columns each time. If it finds that all columns are high, this means that the key 


eee een 
CHAPTER 18: KEYBOARD AND PRINTER INTERFACING 465 


Ground all rows 


Read all columns 


Wait for debounce 


Read all columns 


Figure 18-2. Flowchart for Program 18-1 


466 


Ground next row 


ead all columns 


Key press 
in this row? 


Find which key 
Is pressed 


Get scan code 
from table 


press cannot belong to that row; therefore, it grounds the next row and continues 
until it finds the row the key press belongs to. Upon finding the row that the key 
press belongs to, it sets up the starting address for the lookup table holding the scan 
codes for that row and goes to the next stage to identify the key. 

4. To identify the key press, it rotates the column bits, one bit at a time, into the carry 
flag and checks to see if it is low. Upon finding the zero, it pulls out the scan code 
for that key from the lookup table; otherwise, it increments the pointer to point to 
the next element of the lookup table. 


While the key press detection is standard for all keyboards, the process for deter- 
mining which key is pressed varies. The lookup table method shown in Program 18-1 can 
be modified to work with any matrix up to 8 x 8. Figure 18-2 provides the flowchart for 
Program 18-1 for scanning and identifying the pressed key. 


;the following look-up scan codes are in the data segment 
KCODTO BB 0, T7273 ;key codes for row zero 

NC OD Re Baar seG, 7 ; key codes for row one 

KCOD 2 DB 8,9, 0AH, OBH ; key codes for row two 
RCOD 3 DBE OCH ODE, OEH OFH ;key codes for row three 


;the following is from the code segment 


PUSH BX psave BX 

SUB AL,AL ;AL=0 to ground all rows rows at once 

OUT, PORT A,AL;to ensure all keys are open (no contact) 
K1 ENS) Aly PORESS arcad the -eo@unns 

AND AL,00001111B ;mask the unused bits (D/-D4) 

(CMUE ANIby ONGKONO TEAL IE Ns} zare all keys released 

JNE Kl ;keep checking for all keys released 

CALL DELAY ;wait for 20 ms 
K2: IN AL,PORT_B ;read columns 

AND AL,00001111B ;mask D7-D4 

CMP AL,00001111B ;see if any key pressed? 

JE K2 ;if none keep checking 

CALL DELAY ;wait 20 ms for debounce 

;after the debounce see if still pressed 

ie An, POR. 8 ;read columns 

AND AL,00001111B ;mask D7-D4 

CMP AL,00001111B ;see if any key closed? 

JE K2 ;if none keep polling 


ground one row at a time and read columns to find the 


MOV AL,11111110B ;ground row 0 (D0=0) 
OUT PORT A,AL 
IN AL, PORT B ;read all columns 
AND AL,00001111B ;mask unused bits (D7-D4) 
CMP AL,00001111B ;see which column 
oe RORI ;if none go to grounding row 1 
MOV BX,OFFSET KCOD 0 ;BX=start of table for column 0 key 
JMP FIND IT ;identify the key 

RO We MOV AL; CIEN NONB ; ground row 1 (D1=0) 

T OUT PORT A,AL 

IN AL, PORT B ;read all columns 
AND AL,00001111B j;mask unused bits (D7-D4) 
CMP AL,00001111B ;see which column 


Je RO 2 ;if none go to grounding row 2 


Program 18-1 (continued on the following page) 
aa 
CHAPTER 18: KEYBOARD AND PRINTER INTERFACING 467 


;BX=Start of table for column 1 keys 


MOV BX,OFFSET KCOD 1 


JME rN: IF ;identify the key 
RO 2€ MOY SEp LLLLLOILHS ;ground row 2 (D2=0) 
OUT PORT A Al, 
IN AL, PORT B ;read all columns 
AND AL,00001111B ;mask unused bits (DEBA) 
EMEP PAiEr Ci OMe ihas ;see which column 
JE RORO ;if none go-to grounding row 3 


MOV BxX,OFFSET KCOD 2 ;BxX=start of table for column 2 


we IE ILINID) ILAN ;identify the key 
RO 3: MOM VAL, UT OMME ;ground row 3 (D3=0) 
OULr” PORT A,eu 
IN AL, PORT B ;read all columns 
AND AL,00001111B ;mask unused bits (D7-D4) 
CMP AL,00001111B ;see which column 7 
JE K2 ;if none then false input repeat the process 


MOV BX,OFESET KCOD 3. m EX- Start of table for comuun: 


keys 


7A key press has been detected and the row identified. \ 
;Now find which key. 
FIND IT:RCR AL,1 protate the column input eO Saren For © 
JNC MATCH ;if zero, go get the code 
INC BX ;if not point at the next code 
JMP FIND IT zand keep searching 
oH! PBE CODE FOR THE KEY PRESSED AND RETURN 
MATCH: MOV AL,[ BX] ;get the code pointed by BX 
POP BX ; return with AL=code for pressed key 
RET 


;FOR THE DELAY GENERATION SEE CHAPTER 13 


Program 18-1 (continued from the previous page) 


There are IC chips such as National Semiconductor's MM74C923 that incorporate 
keyboard scanning and decoding all in one chip. Such chips use combinations of counters 
and logic gates (no microprocessor) to implement the underlying concepts presented in 
Program 18-1. 


Review Questions 


1. True or false. To see if any key is pressed, all rows are grounded. 

2. If D3-D0 = 0111 is the data read from the columns, which column does the key 
pressed belong to? 

3. True or false. Key press detection and key identification require two different 
processes. 

4. In Figure 18-1, if the row has D3—DO = 1110 and the columns are D3—D0 = 1110 
which key is pressed? 

5. True or false. To identify the key pressed, one row at a time is grounded. 


e] 


SECTION 18.2: PC KEYBOARD INTERFACING AND PRO- 
GRAMMING 


In the IBM PC and compatibles, a microcontroller is used for both detection and 
identification of keys. This microcontroller has a microprocessor in addition to a few hun- 
dred bytes of RAM, a few kilobytes of EPROM, and a few I/O ports, all on one chip. The 
microcontroller used widely in the IBM PC and compatibles is Intel's 8042 (or some vari- 
ation). The 8042 is programmed to detect and identify the key press. A scan code is 


mee en 
468 


assigned to each key and the microcontroller provides the scan code for the pressed key to 
the motherboard. To allow the keyboard to be detachable from the system board, the key- 
board is connected to the system board through a cable. Such an arrangement necessitates 
the use of serial data communication to transfer the scan code to the main CPU (serial data 
transfer was covered in Chapter 17). IBM PC AT keyboards use the following data frame 
when sending the scan code serially to the motherboard. For each scan code, a total of 11 
bits are transferred from the keyboard to the motherboard. 


one start bit (always 0) 
8 bits for scan code 
odd parity bit 

one stop bit (always 1) 


In the original PC motherboard, a serial-in-parallel-out shift register, 74LS322, 
was used to receive the serial data coming in through the keyboard cable. The 74LS322 
strips away the framing portion, makes an 8-bit scan code, and presents it to port A of the 
8255 with I/O port address of 60H. In the subsequent x86 PCs, the 74LS322 and support- 
ing logic were replaced by another 8042. This allows the option of programming the key- 
board itself. Therefore, two 8042 microcontrollers, one on the keyboard and one on the 
motherboard, are responsible for keyboard bidirectional communication in the x86 PC. 


Make and break 


In the IBM PC, the key press and release are represented by two different scan 
codes. The key press is referred to as a make, and the release of the same key is called a 
break. When a key is pressed (a make), the keyboard sends one scan code, and when it is 
released (a break), it sends another scan code. The scan code for the break is always 127 
decimal (80H) larger than the make scan code. For example, if a given key produces a scan 
code of 06 on make, the scan code for the break is 86H (06 + 80H = 86H). 


IBM PC scan codes 
The original IBM PC keyboard had 83 keys, arranged in three major groupings: 


1. The standard typewriter keys 
2. Ten function keys, F1 to F10 
3. 15-key keypad 


These 83 keys are shown in Table 18-1. With the introduction of the next genera- 
tion of PC, IBM added one more key, "Sys Rq", to make a total of 84 keys. The locking 
shift keys were made more noticeable by providing LED indicators for them. Later, IBM 
introduced what is called the advanced keyboard, known more commonly as the enhanced 
keyboard. The number of keys was increased to 101 for the U.S. market. Tables 18-1, 
18-2, and 18-3 provide the scan codes for both the original PC and enhanced keyboards. 

In Table 18-1, notice that the same scan code is used for a given lowercase letter 
and its capital. The same is true for all the keys with dual labels. If the scan code is the 
same for both of them, how does the system distinguish between them? This is taken care 
of by the keyboard shift status byte. The data area location 0040:0017H holds the shift sta- 
tus byte. The meaning of each bit is given in Figure 18-3. 

The BIOS data area location 0040:0018H holds the second keyboard status byte. 
The meaning of each bit is given in Figure 18-4. Notice that some of the bits are used for 
the 101-key enhanced keyboards. 

When a key is pressed, the interrupt service routine of INT 9 receives the scan 
code and stores it in a memory location called a keyboard buffer, located in the BIOS data 
area. However, to relieve programmers from the details of keyboard interaction with the 
motherboard, IBM has provided INT 16H. We first look at the services provided by the 
BIOS INT 16H and then we study the details of how the keyboard interacts with the moth- 
erboard through hardware INT 09. 


OOO U 
CHAPTER 18: KEYBOARD AND PRINTER INTERFACING 469 


Table 18-1: PC Scan Codes for 83 PC Keys 


[Hex | Key [Hex | Key | Hex | key | Hex | 
| 


ey | | Hex | Key 
oS eela | vale |e | eee nd aa F3 
o2 | tangs | 16 | uandu | 2a | Lensnt | sE z 
o3 | @an2| 17 | andi | 28 | jangi F5 
Los |. #anadeis | Omaa di Æ | zeae | las 


o 
Cos | sanga | 10 Pea {a0 emfa e 
foe | sears | im | coro | 2 [conse | ao | : 

E 

| a 

| 32 


| 
| 
sends | 18 | panay | æ | varov | a3 | 
os | aaaz| ic | enter | 30 | Bande | a4 | 
o | sands {10 | on Nanin | 45 | Numtock | 
| IE A and a M and m | 46 Scroll Lock 
0B and 0 | ile E” 33 < and | 47 ` ZandHome 
0C and - 20 D and d | 34 >and. | 48 8 and Up Arrow 


re + and = | 21 F and f | 35 | zands | 49 | 9 and Pg Up 
0E | backspace] 22 | Gandg | 36 | Righshin| sa | -way | 
| 23 | Hanin | 37 | Prscana| 48 |  4andLeft Arow | 


OF tab 

10 Q and q a | wa pe a a 5 (keypad 

| wandw} 25 | kone | 30 | spacebar| ap | Gand Right Arrow _| 
12 E ande 26 Land | | 3A Caps Be 4E a ==] w E 


e a Yor a | 3B F1 4F 1 and End 
g | sao: | ac | Eps) eo ee 


st | 3andPaDn | 
| [s2] oandins | 
| |s| ander | 

) 


(Reprinted by permission from “IBM BIOS Technical Reference” c . 1987 by International Business Machines Corporation 


47 | de | a5 | a4 | as | a2 | at | a0 | 


Figure 18-3. First Keyboard Status Byte 


470 


Table 18-2: Combination Key Scan Codes 


g Hex | Keys | Hex | Keys | Hex | Keys 
54 atn pe ponro [ee | mrs | e [oa 
Alt 2 


55 Shift F2 ro Ctrl F4 6D Alt F6 
Ctrl F5 Alt F7 Alt 3 
57 Shift F4 Alt 4 


78 
79 
Ea 7A 
e3 | omre | 7B 

58 ames j o omer zc | ats 
59 Shift F6 a 7D 
shez | es | “= 
Cz 7F 

81 


61 
62 
63 
64 
65 
67 
Shit Fo 
shtes | eo | aves | zs | cmenn | e1 | nto | 
NtF3 ~ga 
se | cmr2 | 68 | atra | 


(Reprinted by permission from “IBM BIOS Technical Reference” c . 1987 by International Business Machines Corporation ) 


= 
poi 
cata 
Lie 
pores | n | atro | m | ate | 
Lowes | z2 | comense | æ | mz 
shits | F8 73 med 
ShitF9 a 


5A 
5B 
5C 
5D 
5E 


Table 18-3: Extended Keyboard Scan Codes 


[Keys | Hex | ____Keys [| Hex | Keys | 


eit fee | cm- |o | AtHome | A0 [AtDown Arrow 
m |e | cms |œ | atuparow |a | attegon 


= 
ay 
R 


_snitet1 |oo | om 
CriDownarow| oa | | A3 | AtDelte | 
comen | o2 | cmmset |o | arrenarow | as | au | 
cme  |os | cmpeee | oc | S| as | atta | 


Alt F11 Ctrl Tab pep Alt Right Arrow Alt Enter 
Mt F12 Tn ae 
Ctrl UpArrow | 96 | Cirl * Alt End 


(Reprinted by permission from “IBM BIOS Technical Reference” c . 1987 by International Business Machines Corporation ) 


fale [ela le [ala [a 
Insert pressed Left Ctrl pressed 
Caps Lock pressed Left Alt pressed 


Num Lock pressed Sys Req pressed 
Scroll Lock pressed Ctri/Num Lock 


(pause) pressed 


Figure 18-4. Second Keyboard Status Byte 


CHAPTER 18: KEYBOARD AND PRINTER INTERFACING 471 


BIOS INT 16H keyboard programming 


INT 16H, AH = 0 (read a character) 


This option checks the keyboard buffer for a character. If a character is available, 
it returns its scan code in AH and its ASCII code in AL. If no character is available in the 
buffer, it waits for a key press and returns it. For characters such as F1—F10 for which there 
is no ASCII code, it simply provides the scan code in AH and AL = 0. Therefore, if AL = 
0, a special function key was pressed. This option simply provides the code for the char- 
acter and does not display it. See Example 18-2. 


Example 18-2 


Run the following program in DEBUG. Interpret the result after typing each of the following. Run 
the program for each separately. (a) Z (b) Fl (c) ALT 

MOV AH,0 

INT 16H 

mir «3 
Solution: 
(a) AX = 2C7A AH = 2C, the scan code, and AL = 7A, ASCII for 'Z' 
(b) AX =3B00 AH = 3B, the scan code for F1, and AL = 00, because Fl ‘is not an ASCII key 
(c) Nothing happens because there is no scan code for the Alt key. The status of keys such as 
Alt is found in the keyboard status byte stored in BIOS at 40:17 and 40:18. 


INT 16H, AH = 01 (find if a character is available) 


This option, which is similar to the AH = 0 option, checks the keyboard buffer for 
a character. If a character is available, it returns its scan code in AH and its ASCH code in 
AL and sets ZF = 0. If no character is available in the buffer, it does not wait for a key 
press and simply makes ZF = 1 to indicate that. 


INT 16H, AH = 02 (return the current keyboard status byte) 


This option provides the keyboard status byte in the AL register. The keyboard sta- 
tus byte (also referred to as the keyboard flag byte) is located in the BIOS data area mem- 
ory location 0040:0017H. For the meaning of each bit of the shift status byte, see Figure 
18-3 and Example 18-3. 


Example 18-3 


Run the following program in DEBUG while the right shift key is held down and the Caps Lock key 
light is on. Verify it also by dumping the 0040:0017 location using DEBUG. 

MOV AH, 02 

INT 16H 

INtaS 3 


Solution: 


Running the program while the Right Shift and Caps Lock keys are activated gives 


AH = 41H = 0100 0001 in binary, which can be checked against Figure 18-3. In DEBUG, 
-d 0:417 418 


will provide the keyboard status byte 41-00. 


Due to additional keys on the IBM extended keyboard, BIOS added the following 
additional services to INT 16H. 


INT 16H, AH = 10H (read a character) 


This is the same as AH = 0 except that it also accepts the additional keys on the 
IBM extended (enhanced) keyboard. 


T 


472 


INT 16H, AH = 11H (find if a character is available) 


This is the same as AH = | except that it also accepts the additional keys on the 
IBM extended (enhanced) keyboard. 
INT 16H, AH = 12H (return the current status byte) 


This is the same as AH = 2 except that it also provides the shift status byte of the 
IBM extended (enhanced) keyboard to AH. See Figure 18-5 and Example 18-4. 


Example 18-4 


Run the following program in DEBUG. Interpret the result after typing each of the following. Run 
the program for each separately. (a) F11 (b) ALT F11 (c) ALT TAB 

MOV AH, 10H 

INT 16H 


Tie 23 
Solution: 
After running the program above in DEBUG for each case, we have the following: 
(a) AX = 8500, where 85H is the scan code for F11 
(b) AX = 8B00, where 8BH is the scan code for Alt-F11] 
(c) AX = A500, where ASH is the scan code for Alt-Tab 
All of the cases above have AL = 00 since there is no ASCII code for these keys. 


a7 lele[H[slelala 


Caps Lock pressed Left Alt pressed 
Num Lock pressed Right Ctrl pressed 


Scroll Lock pressed Right Alt pressed 


Figure 18-5. Enhanced Keyboard Shift Status Byte 


Example 18-5 


Write and test a program in DEBUG that increments counter CX whenever Shift-F7 is activated; 
otherwise, it should exit. 

Solution: 

The unassembled program in DEBUG follows. 

SOWON 113 

16B7:0100 B402 MOV AH, 02 

ee 70102 CEG INT 16 

16B7:0104 F6C403 TEST BANOS 


16B7:0107 740A JZ 0113 
16B7:0109 B400 MOV AH, 00 
Mess: 0L0B CD16 INT T6 

oB 2O010D SUPCSA CMP AH, 5A 
moby > OTO 7501 JNZ OILS 
moe s OA Ai INC CX 
mos 20113 GE INT 


PEO en << 


CHAPTER 18: KEYBOARD AND PRINTER INTERFACING 473 


Hardware INT 09 role in the IBM PC keyboard 


To understand fully the principles underlying the IBM PC keyboard, it is neces- 
sary to know how INT 09 works. The IBM PC keyboard communicates with the mother- 
board through hardware interrupt IRQ] of the 8259. As mentioned in Chapter 14, IRQ1 
(INT 09) of the 8259 is used by the keyboard. The way the INT 09 interrupt service works 
is as follows. 


1. The keyboard microcontroller scans the keyboard matrix continuously. When a key 
is pressed (a make), it is identified and its scan code is sent serially to the mother- 
board through the keyboard cable (see Figure 18-6). The circuitry on the mother- 
board receives the serial bits, gets rid of the frame bits, and makes one byte (scan 
code) with the help of its serial-in-parallel-out shift register, then presents this 8-bit 
scan code to port A of the 8255 at I/O address of 60H, and finally activates IRQ1. 

2. Since IRQ! is set to INT 09, its interrupt service routine (ISR) residing in BIOS 
ROM is invoked. 

3. The ISR of INT 09 reads the scan code from port 60H. 

4. The ISR of INT 09 tests the scan code to see if it belongs to one of the shift keys 
(Right Shift and Left Shift), Alt, Ctrl keys, and so on. If it does, the appropriate bit 
of the keyboard status bytes in BIOS memory locations 0040:0017H and 0018H are 
set. However, it will not write the scan code to the keyboard buffer. If the scan code 
belongs to any key other than a special key (Shift, Alt, Ctrl, and so on), INT 09 
checks to see if there is an ASCII code for the key. If there is one, it will write both 
the ASCII and scan codes into the keyboard buffer. If there is no ASCII code for the 
key, it puts 00 in place of ASCII code and the scan code in the keyboard buffer. 

5. Before returning from INT 09, the ISR will issue EOI to unmask IRQI, followed by 
the IRET instruction. This allows IRQ1 activation to be responded to again. 

6. When the key is released (a break), the keyboard generates the second scan code by 
adding 80H to it and sends it to the motherboard. 

7. The ISR of INT 09 checks the scan code to see if there is 80H difference between 
the last code and this one. This is easy since all it has to do is to test D7 (80H = 
10000000 binary). If D7 is high, this is interpreted as meaning that the key has been 
released and the system ignores the second scan code. However, if the key is held 
down more than 0.5 seconds, it is interpreted as a new key and INT 09 will write it 
into the keyboard buffer next to the preceding one. Holding down the key for more 
than 0.5 seconds is commonly referred to as typematic in IBM literature, which 
means repeating the same key. 


From the above steps the following points must be emphasized: 


1. The keyboard sends two separate scan codes for make and break to the mother- 
board. 

2. It is the function of the ISR of INT 09 to read the scan code sent by the keyboard 
and convert it to an ASCII code (if any), then save both the scan code and ASCII 
code in the keyboard buffer of the motherboard. 

3. If any of the special keys, such as Shift, Alt, or Ctrl, is pressed, INT 09 sets the 
appropriate bits to 1 in the BIOS data area of 0040:0017H and 0018H, but it will 
not deliver the scan code to the keyboard buffer. 

4. Ifany undefined combination of keys is pressed, INT 09 is activated but it will 
ignore it since there is no associated scan code. If such key combinations are used 
by a given program, it is the job of the programmer to intercept them by hooking 
into INT 09. 


ne TS ee 
474 


3 - unused 1 - keyboard clock 


5 - +5.0 volts £ 4 - ground 


2 — keyboard data 


Figure 18-6. Keyboard Cable Jack for the PC 
Keyboard overrun 


On the keyboard side, the 8042 circuitry must serialize the scan code and send it 
through the cable to the motherboard. On the motherboard side, there is circuitry respon- 
sible for getting the serial data and making a single byte of scan code out of the streams of 
bits, and holding it for the CPU to read. What happens if the CPU falls behind and cannot 
keep up with the number of keystrokes? Such a situation is called keyboard overrun. The 
motherboard beeps the speaker when an overrun occurs. The beeping process works as fol- 
lows. The circuitry on the keyboard has a buffer of its own to store a maximum of 20 key 
strokes. When this buffer becomes full, it stops receiving keystrokes and sends a special 
byte called an overrun byte (which is FFH in the PC) to the motherboard. After getting the 
scan code, INT 09 first checks to see if the scan code received is the overrun byte, FFH. 
If it is, it will sound the speaker; otherwise, it tests for the shift keys and so on, as 
explained earlier. In other words, the BIOS ROM on the motherboard is responsible for 
beeping the speaker in the event of keyboard overrun. The following program shows this 
process. It provides the beginning and ending codes for the INT 09 interrupt service rou- 
tine, taken from the IBM PC BIOS with some modification for the sake of clarity. 


;KEYBOARD INT 09 INTERRUPT ROUTINE for PC 
KB MWI BROC FAR 


SDI ¿ALLOW FURTHER INTERRUPT 

PUSH AX ; SAVE ALL THESE 

PUSH | BX 7; REGISTERS 

EUS “Cx 

PUSE DX 

PUSH SI 

PUSHTADT 

BUSHEDS 

PUSHTRES 

IN AL, 60H ;READ IN THE CHARACTER FROM PORT 
CMP AL, OFFH OVERRUN CHARACTER? 

JNZ K16 A INO) TES NERORE S BUILI I fasten 

JMP K62 ;SOUND THE BEEPER FOR BUFFER FULL 


or a 
CHAPTER 18: KEYBOARD AND PRINTER INTERFACING 475 


Cini 7 TURN OF B® TNERRRURBTS 


MOV AL, 20H ;ISSUE EOI (END OF INTERRUPT) 20 3259 
OUT 20H,AL ;AT PORT ADDRESS 20H 
BOE ES 

POP DS 

POP DI 

BOR SI 

EOR DX 

BOR CX 7;RESTORE ALL THE 

EOE BX 7 REGISTERS 

BOP AX 

DRET ;RETURN FROM INTERRUPT 


KB_INT ENDP 


Keyboard buffer in BIOS data area 


As mentioned above, the INT 09 interrupt routine gets the scan code from the key- 
board and stores it in some memory locations in the BIOS data area. These memory loca- 
tions are referred to as the BIOS keyboard buffer. This keyboard buffer in the BIOS data 
area should not be confused with the buffer inside the keyboard itself, whose overrun caus- 
es the speaker to beep. 

If there is an ASCII code, INT 09 also stores the ASCII code for the key in the 
keyboard buffer; otherwise, it puts 0 there instead. Where this keyboard buffer is located 
and how it is used by INT 09 are discussed next. 


BIOS keyboard buffer 


A total of 32 bytes (16 words) of memory in the BIOS data area is set aside for the 
keyboard buffer. it starts at memory address 40:001EH and goes to 40:003DH, which cor- 
responds to physical addresses 0041EH and 0043DH. Each two consecutive locations are 
used for a single character, one for the scan code and the other one for the ASCII code (if 
any) of the character. How does INT 9 know in which word of this 16-word buffer it 
should put the next character, and how does INT 16H know which of the characters in the 
keyboard buffer to extract? To answer these questions, we must explain the role of key- 
board buffer pointers. There are two keyboard buffer pointers: the head pointer and the tail 
pointer. See Table 18-4. 


Table 18-4: BIOS Data Area Used by Keyboard Buffer 


Address of Head Pointer Address of Tail Pointer 
41A and 41B 


Tail pointer 


Memory locations 0040:001CH and 0040:001DH (physical addresses 0041CH 
and 0041DH) hold the address for the tail. This means that at any given time, memory 
locations 0041CH and 0041D hold the address where INT 09 should store the next char- 
acter. It is the job of INT 09 to put the character in the keyboard buffer and advance the 
tail by incrementing the word contents of memory location 0041C, where the tail pointer 
is held. 


Head pointer 


Keyboard Buffer 
41C and 41D 41E to 43D 


INT 16H gets the address of where to extract the next character from memory 
locations 41AH and 41BH, the head pointer. As INT 16H reads each character from the 
keyboard buffer, it advances the head pointer held by memory locations 41AH and 41BH. 

The above discussion can be summarized as follows. As INT 09 inserts the char- 
acter into the keyboard buffer it advances the tail, and as INT 16H reads the character from 


a a a a 
476 


the keyboard buffer it advances the head. When they come to the end of the keyboard 
buffer they both wrap around, creating a ring of 16 words where the head is continuously 
chasing the tail. This is shown in Figure 18-7. 

Notice in Figure 18-7 that if the keyboard buffer is empty, the head address is 
equal to the tail address. As INT 09 inserts characters into the buffer, the tail is moved. If 
the buffer is not read by INT 16H, it becomes full, which causes the tail to be right behind 
the head. Look at Example 18-6. 


Figure 18-7. Keyboard Buffer Head and Tail 
Example 18-6 


Using DEBUG, dump the location where the head and tail pointers are held. Compare them. Is the 
buffer full or empty? 


Solution: 


C>DEBUG 
-D 0:410 41F 
0000:0410 63 44 FO 80 02 00 01 40-00 00 3C 00 3C 00 20 39 cD.....@..<.<. 9 


In this case, the head and tail pointers point to the same location; therefore, the buffer is empty. 


PC keyboard technology 


The kind of keyboard shown in Figure 18-1 is referred to as a hard contact key- 
board. When a key is pressed, a physical contact between the row and column causes the 
column to be pulled to ground. Although these kinds of keyboards are cheaper to make, 
they have the disadvantage of deteriorating at the contact points of rows and columns, 
eventually becoming too stiff to use, due to metal oxidation of the contact points. The 
alternative to hard contact keyboards are capacitive keyboards. In such keyboards there is 
no physical contact between the rows and columns; instead, there is a capacitor for each 
point of the matrix. In capacitive keyboards when a key is pressed, the change in capaci- 
tance is detected by a sense amplifier and produces the logic level, indicating that a key 
has been pressed. Capacitive keyboards last much longer than do hard contact keyboards. 
All IBM PC and PS machines use capacitive keyboards except the now abandoned PC Jr. 
models, which used hard contact keyboards. 


Review Questions 


1. Show the bits transferred from the keyboard to the motherboard when "J" is 
pressed. 

2. How does the PC recognize the difference between the key press and key release? 

3. Find the make and break scan codes for the letter "X" in both hex and binary. 


ee eee eee ere eee 


CHAPTER 18: KEYBOARD AND PRINTER INTERFACING 477 


4. True or false. The CPU is notified through INT 09 only for key press, not for key 

release. 

Does the Alt key have a scan code? If yes, what is it? 

True or false. The CPU stores the scan code for the right SHIFT in the buffer. 

Find the contents of the keyboard status byte if CapsLock and Alt are pressed. 

True or false. INT 09 is responsible for finding the ASCII code for a given key if 

there is one. 

9. True or false. The keyboard buffer holds both the scan code and the ASCII code for 
a given key. 

10. True or false. The beep sound indicates that the BIOS keyboard buffer is full. 

11. What does it mean when the head and tail pointers have the same values? 

12. As INT 09 puts the scan code into keyboard buffer it advances the (tail, 
head) pointer. 


o 


SECTION 18.3: PRINTER AND PRINTER INTERFACING IN ` 
THE IBM PC 


In this section we describe the standard printer interface, called the Centronics 
printer interface. Then we study the IBM PC printer interfacing and provide some exam- 
ples of printer programming using BIOS INT 17H. 


Centronics printer interface pins 


The Centronics type parallel printer interface is the printer interface standard in 
the x86 PC. Itis also referred to as Epson FX 100 standard. It is a 36-pin interface con- 
nector where the pins are labeled as 1 to 36. Many of the 36 pins are used for ground, 
allowing many signals to have their own ground return lines, which reduces electrical 
noise. The 36 pins can be grouped as follows. 


1. The data lines, which carry the data sent by the PC to the printer. 
Printer status signals, which indicate the status of the printer at 
any given time. 

3. Printer control signals, which are used to tell the printer what 


to do. 
4. Ground signals, which provide an individual ground return line 4 oaan 
for each data line and for certain control and status lines. 5 | Data bit 3 
Data lines and grounds 7 | Databit 5 
ee l O T 


Data bit 6 
Data bit 7 
Acknowledge 


Input pins DATA 1 to DATA 8 provide a parallel pathway for 
8-bit data sent by the PC to the printer. Table 18-5 describes the DB- 
25 printer pins. Notice in Table 18-6 that pins 20 to 28 are used for 
individual ground return lines, one for each data pin. Figure 18-8 
shows the connector. 


Out of paper 


Printer status signals 


Auto feed 
Error 


These are all output pins from the printer to the PC used by 


P Skita : 16__[ Initialize printer 
the printer to indicate its own status. They are as follows. 


Select input 


Table 18-5: DB-25 Printer 
Pins 


PE (pin 12) is used by the printer to indicate that it is out of 


paper. 


BUSY (pin 11) is high if the printer is not ready to accept a 
new character. This pin is high when the printer is off line or when it 


is printing and cannot accept any data. The PC monitors this pin 
continuously and as long as this pin is high, it will not transfer data to 


the printer. 24 | Ground 


25 | Ground 


478 


Table 18-6: Centronics Printer Specification 


STROBE 


DATA 1 
DATA 2 
DATA 3 


| 

a 

| in| 
| pataa O| 4 — a 
| DATAS | 

> ar 

| our__| 


Description 
STROBE pulse to read data in. Pulse width must be more 
than 0.5 ps at receiving terminal. The signal level is 


normally “high”; read -in of data is performed at the “low” 
level of this signal 


These signals represent information of the 1st to 8th bits of 
parallel data, respectively. Each signal is at “high” level 
when data is logical “1” , and “low” when logical “0”. 


| + 


19 
20 
21 
22 
23 
4 
5 
6 
7 


2 DATA 5 
2 
2 
2 


DATA 6 
DATA 7 


DATA 8 
Approximately 0.5 us pulse; “low” indicates data has been 


ACKNLG OUT received and printer is ready for data 


BUSY OUT A “high” signal indicates that the printer cannot receive 
data. The signal becomes “high” in the following cases : (1) 
during data entry, (2) during printing operation , (3) in “off- 
line” status, (4) during printer error status . 

A “high” signal indicates that printer is out of paper . 
SLCT Indicates that the printer is in the state selected . 


With this signal being at “low” level , the paper is fed 
automatically one line after printing . (The signal level can 


be fixed to “low” with DIP SW pin 2-3 provided on the 
control circuit board .) 


Logic GND level. 
Printer chassis GND. In the printer, chassis GND and the 
logic GND are isolated from each other . 
- “Twisted-pair return” signal’; GND level. 


When this signal becomes “low” the printer controller is 
reset to its initial state and the print buffer is cleared . 
Normally at “high” level; its pulse width must be more 
than 50 us at receiving terminal . 


PERAR The tevel of this signal becomes “low” when printer is in 
ERROR OUT “paper end” , “off-line” and “error” state . 


10 
Wi 


2 
2 


8 
9 
30 


-5 
: 
S 
=] 


= aA len =}. |> 
=~ |S io Blo |N 


19 - 30 


Same as with pin numbers 19 to 30. 


le eer 
el Pulled up to +5 v dc through 4.7 K ohms resistance. 


Data entry to the printer is possible only when the level of 
this signal is “low” . (Internal fixing can be carried out with 
DIP SW 1-8. The condition at the time of shipment is set 


wl lw lw jw wl? jo 
E ° ble ah 
Z| 
4 
z= 


“low” for this signal .) 


(Reprinted by permission from “IBM Technical Reference Options and Adapters” c . 1981 by Internationa! Business Machines ) 
Corporation) 


13 


io SMe 56 7 
ome Sew ome Te” 16 O 


OO Ome t@m@eO @ © OLO Q 
1415 16 17 18 19 20 21 22 23 24 25 


Figure 18-8. DB-25 (Male) Printer Connector 


cee eee ee 
CHAPTER 18: KEYBOARD AND PRINTER INTERFACING 479 


ERROR (pin 32) is normally a high output and is activated (goes low) when there 
are conditions such as out of paper, off line state, or jammed printhead in which the print- 
er cannot print. 

SLCT (pin 13) is active high and goes from the printer to the PC when the print- 
er is turned on and online, indicating that the printer is being selected. 

ACKNLG (pin 10) is used by the printer to acknowledge receipt of data and that 
it can accept a new character. 


Printer control signals 


STROBE (pin 1) and ACKNLG are the most widely used signals among control 
and status pins. When the PC presents a character to the data pins of the printer, it activates 
the STROBE pin of the printer, telling it that there is a byte sitting at the data pins. When 
the printer picks up the data and is ready for another byte, it sends back the ACKNLG sig- 
nal. While the STROBE is used by the CPU to tell the printer that there is a byte of data, 
it is the printer that must acknowledge the data receipt and its readiness for accepting 
another byte through the ACKNLG line. The ACKNLG signal can be used by the CPU to 
go and get another byte of data to be presented to the printer. 

INIT (pin 31) is an input into the printer and is normally high. When it is activat- 
ed (active low) it resets the printer. Upon receiving this signal, the printer goes through a 
sequence of internal initialization, including clearing its own internal buffer. 

There are two other control signals in the printer: AUTO FEED XT and SLCT IN. 
See Table 18-6 for their descriptions. The following are the steps in computer and printer 
communication. 


1. The computer checks to see if a BUSY signal from the printer indicates that the 
printer is ready (not busy). 

2. The computer puts 8-bit data on the data line connected to the printer data pins. 

3. The computer activates the STROBE pin by making it low. Prior to asserting the 
printer input STROBE pin, the data must be at the printer's data pins at least for 0.5 
us. This is data setup time. 

4. The STROBE must stay low for at least 0.5 us before the computer brings it back 
to high. The data must stay at the printer's data pins at least 0.5 us after the 
STROBE pin is deasserted (brought back to high). 

5. The activation of STROBE causes the printer to assert its BUSY output pin high, 
indicating to the computer to wait until it finishes taking care of the last byte. 

6. When the printer is ready to accept another byte, it sends the ACKNLG signal back 
to the computer by making it low. The printer keeps the ACKNLG signal low only 
for 5 us. At the rising edge of ACKNLG, the printer makes the BUSY (not BUSY = 
ready) pin low to indicate that it is ready to accept the next byte. 


The CPU can use either the ACKNLDG or BUSY signals from the printer to ini- 
tiate the process of sending another byte to printer. Some systems use BUSY and some use 
ACKNLG. 


IBM PC printer interfacing 


In the IBM PC, the POST (power-on self-test) portion of BIOS is programmed to 
check for printers connected to parallel ports. As they are identified, the base I/O port 
address of each is written into the BIOS data area 0040:0008 to 0040:000FH just like the 
COM port discussed in Chapter 17. A total of 8 bytes of memory in the BIOS data area 
can store the base I/O address of four printers, each taking 2 bytes. 

Memory locations 00408 to 0040FH can be checked to see which LPT (line print- 
er) port is available. Memory locations 0040:0008H and 0040:0009H (physical locations 
00408H and 00409H) hold the base I/O address of LPT1, and so on, as shown in 
Table 18-7. If no printer port is available, Os are found. 


Se 
480 


It must be emphasized Table 18-7: BIOS I 
that the base I/O port addresses i : a ee 
aaa ie ie 
the fact that the POST (power- : = : 
on self-test) portion of BIOS 0040:000A — 0040:000B ea 
printer port first at I/O address 0040:000€ — 0040:000D LPT3 
03BCH, then at 0378H, and 0040:000E — 0040:000F LPT4 
finally, at 0278H. Whichever is 
is expected; the second one found is written to the 40AH address for LPT2; and so on. 
Table 18-8: IBM PC Printer Ports and Their Functions 
Data Port (RIW Status Port (Read Onl Control Port (R/W 
LPT 4 03BCH 03BDH 03BEH 
sl 0378H 0379H 037AH 
LPT 3 0278H 0279H 027AH 
one I/O port for the LPT's data lines, one for the LPT's status lines, and one for the LPT's 
control lines. For example, if the base I/O port address for LPT1 is 378H, the I/O port 
address 378H is used for the data, 379H for the status, and 37AH for the control signals. 


assigned to LPTs can vary from 
0040:0008 — 0040:0009 ĽETI 
will check for the existence of a 
found first will be written into BIOS data area 408H, where the base I/O address for LPT1 
Printer interfacing circuitry uses only three I/O ports starting at the base address: 
See Table 18-8 for the assignments. Also see Example 18-7. 


Figure 18-9 shows the printer's data, status, and control ports. 


Example 18-7 

Using DEBUG, determine which printer port(s) are available. 
Solution: 

C:\>DEBUG 


peeeo:08 L8 
0040:0008 78 03 00 00 00 00 00 00 


This shows that the base I/O address of LPT1 is 0378H. No other printers are connected to the par- 
allel ports. 


What is printer time-out? 


Occasionally, the printer time-out message will appear on the screen. This means 
that the printer port is installed but the printer is not ready to print. This could be due to 
the fact that the printer is turned off, the printer is not on line, or some other condition in 
which the printer is connected to the PC but not ready to print. Upon detecting that the 
printer port is installed, BIOS tries repeatedly for a period of 20 seconds to see if it is ready 
to accept data. If the printer is not ready, the PC gives up (time-out) and displays a mes- 
sage to indicate that. Can the PC be forced not to give up so soon and try a little bit longer? 
The answer is yes. The amount of time that BIOS tries to get a response from the printer 
is stored in BIOS data area 0040:0078 to 0040:007B. Location 0040:0078 holds the time- 
out time for LPT1, 0040:0079 the time for LPT2, and so on. At boot time, these locations 
are initialized to 20 seconds. 


een ee ee nee nnn ee 
CHAPTER 18: KEYBOARD AND PRINTER INTERFACING 481 


Data Port Status Port Control Port 


Reserved STROBE 
Reserved AUTO FDXT| 


RQ INIT 
ERROR SLCT(In) 
SLCT IRQ Enable 
BE Direction 
ACK Reserved 
BUSY Reserved 


D5 of the control port is used in extended mode only. Extended mode allows 
use of DO - D7 as a bidirectional data bus. Not all PCs support extended mode . 


Figure 18-9. Printer's Data, Status, and Control Ports 
ASCII control characters 


) Certain characters in ASCII are used to control the Table 18-9: ASCII Printer Control 
printer. Table 18-9 shows the most commonly used printer 
control characters in ASCII. 


Programming the IBM PC printer with BIOS INT 
17H 


Characters 


[eal 
Symbol | Code 
es | 08 [Backspace | 


BIOS INT 17H provides three services: printing 
a character, initializing the printer port, and getting the 
printer status port. These options are selected according to 
the value set in the AH register. This is described as fol- 
lows. 


INT 17H, AH = 0 (print a character) 


OC | Form feed (advances to next 
Page) 
a enen 
A A 5 margin) 
If this option is selected, INT 17H expects to 
have the LPT number in register DX (0 for LPT1, 1 for 


LPT2, and 2 for LPT3) and the ASCII character to be printed in the AL register. Upon 
return, INT 17H provides the status of the selected printer port as follows. 


CR 


Bit No. Function 

71= Not BUSY(ready), 0 = BUSY 
61= Acknowledge 

51= Out of paper 


41= Printer selected 
31= TO error 

2,1 unused 

01= Printer time-out 


See Example 18-9. 
INT 17H, AH = 02 (get the printer port status) 


This option allows a programmer to check the status of the printer. Before calling 
the function, AH is set to 2 and DX holds the printer number (0 = LPT1, 1 = LPT2, and 2 
= LPT3). After calling, AH = status, the situation is the same as shown under option 0. See 
Example 18-10. 
ťa 
482 


Example 18-10 


Run the following program in DEBUG to check the LPT1 printer states. Run it once with the print- 
er off line, then run it again with the printer on line. Interpret the AH register upon return. 

MOV AH,2 

MOV -Dx,0 

INT 17H 


Solution: 


C:\>DEBUG 

-A 

16B7:0100 MOV AH,2 

USB7 OTO02 Mey DX, 0 

MOBI OTOS INT 17 

WSB: OLOV INT 3 

16B; 9108 

=E 

AX=0800 BX=0000 CX=0000 DX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=16B7 ES=16B7 SS=16B7 CS=16B7!~ IP=0107 NV UP DI PL NZ NA PO NC 
KOEIE OLOT CC INT 3 

-~G=100 

AX=9000 BX=0000 CxX=0000 DxX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=16B/7 ES=16B7 SS=16B7 CS=16B7 IP=0107 NV UP DI PL NZ NA PO NC 
obs 0107 CC INT 3 

=0 


The first execution of the program occurred when the printer was off line. It returned AH = 08. This 
indicates an I/O error. The program was run again with the printer on line. This time it returned AH 
= 90, which indicates "not busy." 


Inner working of BIOS INT 17H for printing a character 


Below is a listing of a portion of BIOS INT 17H with some modification for the 
sake of clarity. It shows how a character is printed by monitoring the BUSY signal from 
the printer and issuing a STROBE. Figure 18-10 diagrams printer timing. 


>This skeleton of BIOS INT 17H shows how a character is issued 

; to printer is taken from 

;the IBM PC Technical Reference. The instructions not shown here 
A incide: 

; (a) Loading DX with base 1/0 address of printer port from BIOS 
; data area 0040:0008 OOOF 

;Reminder: The base I/O address is the address of the printer 

AB data DUS- 

(peloadang ce time out value anto BL reg from BIOS data area 

; OO4Z0F0078H 00MB 

>Therefore we have the following upon going into this portion of 
2 AROGA 

;BL=time out value 

>DX=has the base I/O address, which is LPT's data port 
-—Al=ehnaracter to be printed 


OUT DX, AL ROU CHARAGCIERT TO BE PRaENI ED 
IONIC DX RICUUNAL HO) SARU SMEORT 


inne 


CHAPTER 18: KEYBOARD AND PRINTER INTERFACING 483 


Ben SUB Cx CX ; TIMER VALUE FOR BUSY 


BI le LN ADDX  CETESTATUS 
MOV AH, AL ;SAVE IT IN AH 
TEST AL,80H ¿IS BUSY LINE HIGH? (Sherri G.¢a,, P7=—=BUcr) 
JNZ B4 ; IF READY THEN OUTPUT THE STROBE 
LGOP B31 7; TRY AGAIN 
DEC BL 7DROP LOOP COUNT 
JNZ B3 7GO UNTIL TIME OUT ENDS 
OR AH, O1 7;SET ERROR FLAG 
AND AH, OF 9H 7 CURN T OFETOTHERTBIHES 
IRET ;RETURN WITH ERROR FLAG BIT SET 
B4: MOV AL, ODH PSHET STROBE HUGH 
INCDX ;DX=I/O PRINTER CNTL REG 
OUTDX, AL STROBE TS BTT 0” OF VONEReREG 
MOVAL, OCH 7SET STROBE LOW 
OUTDX, AL AND SEND IT TO PRINTER CONTROL PORT 


Notice the steps taken to print a character in the above listing. 


1. Send the character to the D7—D0 latch connected to the data pins of the printer. 
Test to see if BUSY is low (NOT BUSY). If ready (NOT BUSY), issue the 
STROBE to ask the printer to grab the data by making STROBE = high and then 
STROBE = low. 

3. Ifthe printer is BUSY, try again until the time-out is finished. The time-out forces 
the CPU to check the printer for a period of time before it gives up. 

4. If for whatever reason the printer does not respond after trying repeatedly, go back 
and set the time-out bit to indicate that. 


DATA 


STROBE 


us = microsecond 


Figure 18-10. Printer Timing 
Bidirectional LPT port 


Since the introduction of the first IBM PC in 1981, the PC's parallel port has gone 
through various changes. Next we give an overview of SPP, PS/2, EPP, and ECP parallel 
port types and provide some parallel port interfacing tips. First, we discuss the character- 
istics of each parallel port type. 


SPP 
SPP stands for standard parallel port. This refers to the parallel port of the first 


IBM PC introduced in 1981. The data bus in SPP is unidirectional and is designed to send 
data from the PC to the printer. At that time, designers never thought that someone might 


———— 
484 


want to use the LPT's data bus for input. In SPP, the internal logic circuitry is set for data 
output only and any attempt to use the data bus for input can damage the LPT port. For 
this reason you should never try to modify the LPT port unless you know what you are 
doing. Some designers use the status and control port of the SPP to send data in. In such 
cases, the pull-up resistors are used to prevent damage to the LPT's parallel port. For fur- 
ther information, refer to web page http://www.lvr.com. 


PS/2 


The first change in the data bus portion of the LPT port occurred in 1987 with the 
introduction of PS/2 computers by IBM. By then, designers had seen the potential use of 
parallel ports for fast data acquisition. Therefore, internal circuitry of the data section of 
the LPT port in the x86 PC was changed to make it bidirectional. However, upon bootup, 
BIOS configured the LPT port as SPP, meaning that it was to be used only for data output. 
At the same time, the C5 bit of the control port (base +2) was modified to allow the user 
to change the data port direction. At bootup, C5 is low (C5 = 0) meaning that the data port 
is for output. By making C5 = 1, we can make the data port an input port. Recall from the 
last section that control port CO—C4 was already used by the SPP. Therefore, in the LPT 
port, C5 of the control port is used for data port direction, while C6 and C7 are reserved. 


Example 18-11 


Assume that the I/O base address = 278H for LPT2 in a PS/2-compatible PC. Show how to change 
the control bit C5 to make the data port an input port. 


Solution: 


The I/O base address = 278H is for the data port. This means that we have 279H for the status port 
and 27AH for the control port. 


MOV DX, 2 7AH ;DX=control port address 

IN AL, DX ;get the current information 

OR AL, 00100000 ;make C5=1 without changing anything else 
OUT DX, AL 7NOW data port istan inputi por: 


Examine the I/O addresses 278H, 279H, 27AH in Appendix E. It says that the data port is RW 
(read/write), the status port is RO (read only), and the control port is RW. 


How to detect a bidirectional data bus 


The following are steps in detecting if your LPT data port is bidirectional. 


1. Put the data port in bidirectional mode by writing 1 to C5 of the control port (C5 = 
ily 

2. Write a known value (such as 55H, AAH, or 99H) to the data port. 

Read back the value from the data port. 

4. Ifthe read value matches the value written to the data port, the data port is not bidi- 
rectional. 


Go 


By making the data port bidirectional in the PS/2, IBM set a new standard, allow- 
ing many devices such as tape backup, scanners, and data acquisition instruments to use 
the LPT port instead of a PC expansion slot. However, there is one problem with LPT 
ports: They are too slow. This slowness led Intel and Xircom, along with other companies, 
to set a new LPT standard called EPP. 


a aM 
CHAPTER 18: KEYBOARD AND PRINTER INTERFACING 485 


EPP 


EPP stands for enhanced parallel port. It is the same, but much faster. Recall from 
Section 18.3 that handshaking signals such as the strobe signal are generated by software. 
In EPP, a higher speed was achieved by delegating the handshaking signals to the hard- 
ware circuitry on the LPT port itself. The EPP standard also added new registers to the I/O 
port address space beyond base address +2. In EPP, the I/O space goes from base to 
base+7. For example, if the base address is 278H, 279H and 27AH are the same as SPP. 
However, I/O addresses 27BH through 27FH are also used or reserved. 


ECP 


ECP stands for extended capability port. The need for an even faster LPT port led 
to ECP. The ECP has all the features of EPP plus DMA (direct memory address) capabil- 
ity, allowing it to transfer data via the DMA channel. It also has data compression capabil- 
ity. The DMA and data compression capabilities make ECP an ideal port for high-speed 
peripherals such as laser printers and scanners. This is the reason that Hewlett Packard 
joined with Microsoft in developing the ECP standard. While the ECP type LPT port is 
supposed to support SPP, PS/2, and EPP, not all of them can emulate EPP. 

To see if a given ECP supports EPP, examine the PC technical documentation or 
check the CMOS setup on your PC. 

In order to unify these various types of LPT ports, a committee of the IEEE has 
put together specification IEEE 1284. Refer to the IEEE web page: http://www.iece.org. 


Review Questions 


1. The Centronics printer standard uses (serial, parallel) data transfer. 

2. Give one reason why there are 8 bits for the data lines in the Centronics standard. 

3. The status signals of the printer are (in, out) for the printer and 
(in, out) for the computer. 

4. The control signals of the printer are _ (in, out) for the printer and 
(in, out) for the computer. 

5. STROBE isan (in, out) signal for the printer and _ (in, out) for the 
computer. 

6. ACKNLGisan____ (in, out) signal for the printer and___ (in, out) for the 
computer. 

7. D1-D8 are (in, out) signals for the printer and __ (in, out) for the 
computer. 

8. BUSY isan____ (in, out) signal for the printer and _ (in, out) for the 
computer. 

9. Outofpaperisan___ (in, out) signal for the printer and __ (in, out) for the 
computer. 


10. How does the computer know if the printer got the last byte sent and is ready for 
the next one? 

11. State the role and level of activation for the STROBE signal. 

12. Ifthe base I/O address of a given LPT is 3BCH, give the I/O address for each of the 
following lines of the printer. 
(a) control (b) status (c) data 

13. Assuming that the I/O base address of LPT1 is 378H, show a simple Assembly lan- 
guage program that monitors the BUSY line of the printer. 

14. What is a time-out in IBM PC terminology? 

15. Give the ASCII codes for the carriage return and line feed in hex. 


486 


PROBLEMS 


SECTION 18.1: INTERFACING THE KEYBOARD TO THE CPU 


lip 


2 


S 


4. 


Soa 


10. 


In reading the columns of a keyboard matrix, if no key is pressed we should get all 
(1s, Os). 

In Figure 18-1, to detect the key press, which of the following is grounded? 
(a) all rows (b) one row at time (c) both (a) and (b) 

In Figure 18-1, to identify the key pressed, which of the following is grounded? 
(a) all rows (b) one row at time (c) both (a) and (b) 

For Figure 18-1, indicate the column and row for each of the following. 
(a) D3-D0 = 0111 (b) D3-D0 = 1110 

Indicate the steps to detect the key press. 

Indicate the steps to identify the key pressed. 

Modify Program 18-1 and Figure 18-1 for a 4 x 5 keyboard (4 rows, 5 columns). 
Modify Program 18-1 and Figure 18-1 for a 6 x 6 keyboard. 

Indicate an advantage and a disadvantage of using an IC chip for keyboard scanning 
and decoding instead of using a microprocessor. 

What is the best compromise for the answer to Problem 9? 


SECTION 18.2: PC KEYBOARD INTERFACING AND PROGRAMMING 


Le 
12. 
131 
14. 
IS 


16. 
le 


In the IBM PC for each key press (make), bits are transferred to the main CPU. 
What are these bits? 

Find the break code for the following make codes. 

(a) 34H (b) 1AH (c) SFH 

Identify make and break among the following codes. 

(a) 9BH (b) 89H (c) 17H (d) C2H (e) 79H 

Since keys "5" and "%" have the same scan code, how are they distinguished? 

Find the scan code for the following. 

(a) ALT F2 (b) SHIFT F4 (c) & 

(d) V (e) Pg Up (f) F6 

Which option of INT 16H is used to get the status bytes of enhanced keyboards? 
Write a program to display a prompt such as "I will play for you the 'Happy Birthday' 
music if you guess the key I am thinking of. The key is one of the ALT Fs." Using INT 
16H to monitor the scan codes continuously, if Alt F9 is activated the PC should play 
"Happy Birthday" and exit to DOS (for "Happy Birthday" music, see Chapter 13). If 
any other key is pressed it should display a message such as "Try again" and contin- 
ue. The Esc key should exit to DOS. 


. INT 09 is assigned to which IRQ of the 8259? 

. True or false. INT 09 is activated for both the make and break scan codes. 

. True or false. If CapsLock is pressed, INT 09 saves it in the keyboard buffer. 

. What value does the BIOS keyboard subroutine save in the keyboard buffer if there is 


no ASCII code for a given key? 


’. When there is keyboard overrun, which of the following generates the sound beep: the 


circuitry inside the keyboard or the motherboard? 


. The keyboard shift status byte indicates the status of which keys? 
. Which of the following keys are non-ASCII keys? 


(a) HOME (b)!  (c) Arrow (d) * 


. Give the content (in hex and binary) of the keyboard shift status byte if only the 


NumLock and CapsLock are on. 


. If the content of the first keyboard shift byte is 10000001, what does it mean? 
. Give the physical address of memory locations in the BIOS data area set aside for each 


of the following keyboard components. 
(a) shift status byte (the first one) (b) buffer 
(c) buffer's tail address (d) buffer's head address 


pm SR SS 
CHAPTER 18: KEYBOARD AND PRINTER INTERFACING 487 


28. When the addresses of the head and tail are the same, what does it mean? 

29. The key buffer is (empty, full) if the address of the tail is one number higher 
than the address of the head. 

30. What keyboard technology is used in IBM enhanced keyboards? 


SECTION 18.3: PRINTER AND PRINTER INTERFACING IN THE IBM PC 


31. State the four categories of Centronics printer pins. 

32. Of the following pins, which belongs to the printer's status signal and which belongs 
to the printer's control signal categories? Indicate which is input and which is output 
from the point of view of the PC. 

(a) BUSY 
(b) STROBE 
(c) ACKNLG 
(d) SLCT 

(e) INIT 

(f) PE 

33. Which pin is used by the printer to indicate that it is out of paper? 

34. True or false. Each data line has its own ground return line. 

35. What is the function of the BUSY signal? 

36. In response to STROBE, the printer makes ACKNLG (low, high). 

37. When does the BUSY signal go low? 

38. True or false. The base I/O address for the printer port can be 378H or 3BCH. 


ANSWERS TO REVIEW QUESTIONS 


SECTION 18.1: INTERFACING THE KEYBOARD TO THE CPU 


True 
Column 3 
True 

0 

True 


Se 


SECTION 18.2: PC KEYBOARD INTERFACING AND PROGRAMMING 


1. The scan code is 24H. This has the odd parity bit of 1; therefore, the following bits 
are transferred from the keyboard to the motherboard: 1 1 00100100 0. 
The scan code for break is always 80H larger than the scan code for make. 
2DH = 00101101 and ADH = 10101101 

False, for both 

Yes, it is 38H. 

False 

48H 

True 

. True 

10. False; the buffer inside the keyboard is full. 

11. The keyboard buffer on the motherboard is empty. 

12. Tail 


OPI HDARWH 


SECTION 18.3: PRINTER AND PRINTER INTERFACING IN THE IBM PC 


Parallel 

Since the characters are 8-bit ASCII code 
Out, in 

In, out 


—_——$—$—$— 
488 


T ae 


5. In, out 

6. Out, in 

7. In, out 

8. Out, in 

9. Out, in 

10. Through the ACKNLG or BUSY signals; either one can be used. 
i 


. It must be high normally. When the computer has a byte of data for printer it makes it 
go low to inform the printer. 


12. (a) 3BEH for control (b) 3BDH for status (c) 3BCH for data 


pS MOV DX, 379H ;LPT1 STATUS PORT ADDRESS 
Al: IN AL, Dx ;GET THE LETI STATUS 
TES TAL, 80H 71S (DI=l WVEUSY STENAL) 
JNZ... ;NOT BUSY 
JMP A1 ;TRY AGAIN 


14. When the PC tests the printer status port and cannot get any response from the print- 


er, it tries again for a certain time period; then if it does not get any response, it sets 
the time-out bit. 


15. ODH and OAH 


eee errr e er eee ——ooo>E—=——== 


CHAPTER 18: KEYBOARD AND PRINTER INTERFACING 489 


490 


CHAPTER 19 


HARD DISKS 


OBJECTIVES 


Upon completion of this chapter, you will be able to: 


>> 
>> 


>> 


>> 


>> 


Contrast and compare the terms primary storage and secondary storage 
Discuss hard disk organization in terms of the boot record, FAT, and 


the directory 


Analyze the capacity of hard disks in terms of sectors, tracks, clusters, 


cylinders, and platters 


Define hard disk terminology: partitioning, interleaving, low-level and 


high-level formatting, parking the head, and MTBF 


Define the components of hard disk access time: seek time, settling time, 


and latency time 


491 


This chapter will examine the characteristics of hard disk storage. The chapter 
includes a discussion of data encoding techniques, interfacing standards, and definitions 
of hard disk terminology. Hard disk organization, characteristics, and terminology are 
covered in this chapter. 


SECTION 19.1: HARD DISK ORGANIZATION AND PERFOR- 
MANCE 


In the early days of the personal computer, cassette tape was used to store infor- 
mation. Due to its long access time, it was abandoned as a secondary storage medium. The 
term secondary storage refers to memory other than RAM. RAM is called primary stor- 
age since the CPU asks for the information that it needs from RAM first. This section will 
look at the characteristics of hard disks and their organization with emphasis on perform- 
ance factors such as access time and interfacing standards. The hard disk, referred to 
sometimes as fixed disk, or Winchester disk in IBM literature, is judged according to three 
major criteria: capacity, access time (speed of accessing data), and interfacing standard. 
Before delving into each category, an explanation should be given for the use of different 
names such as fixed disk, Winchester disk, and hard disk to refer to the same device. The 
term hard disk comes from the fact that it uses hard solid metal platters to store informa- 
tion instead of plastic as is the case in floppy disks. It is also called fixed disk because it 
is mounted (fixed) at a place on the computer and is not portable like the floppy disk 
(although some manufacturers make removable hard disks). Why is it also called the 
Winchester disk? When IBM made the first hard disk for mainframes it was capable of 
storing 30 megabytes on each side and therefore was called a 30/30 disk. The 30/30 began 
to be called the Winchester 30/30, after the rifle, and soon it came to be known simply as 
the Winchester disk. 


Capacity of the disk 


In order to store data on the disk, both sides are coated with magnetic materials. 
The principles behind the process of reading and writing (storing) digital data on disks is 
the same as that used in any magnetic-based medium. Each side of the disk is organized 
into tracks and sectors as shown in Figure 19-1. Tracks are organized as concentric circles 
and their number per disk varies from disk to disk, depending on the size and technology. 
Each track is divided into a number of sectors, and again the number of sectors per track 
varies, depending on the density of the disk. Each sector stores 512 bytes of information. 


Hard disk capacity and organization 


One of the most important factors in judging a hard disk is its capacity, the num- 
ber of bytes it can store. The capacity of hard disks ranges from a few hundred megabytes 
to many thousands of gigabytes (a gigabyte is 1024 megabytes). Regardless of the capac- 
ity of the hard disk, they all use hard metal platters to store data. In general, the higher the 
number of platters, the higher the capacity of the disk. Just as in the floppy disk, both sides 
of each platter in the disk are coated with magnetic material. Likewise, it uses a storage 
scheme that divides the area into sectors and tracks. There is one read/write head for each 
side of every platter, and these heads all move together. For example, a hard disk with 4 
plates might have 8 read/write heads, one for each side, and they all move from the outer 
tracks into the inner tracks by the same arm. Hard disks give rise to more complex organ- 
ization and hence a new term, the cylinder, which consists of all the tracks of the same 
radius on each platter. Since all the read/write heads move together from track to track it 
is logical to talk about cylinders in addition to tracks in the hard disk. Why do all the heads 
move together? The answer is that it is too difficult and expensive to design a hard disk 
controller that controls the movement of so many different heads. In addition, it would 
prolong the access time since it must stop one head and then activate a different head con- 
tinuously until it reaches the end of the file. Using the concept of the cylinder, all the 
tracks of the same radius are accessed at the same time, and if the end of the file is not 
reached, all the heads move together to the next track. The number of read/write heads 


492 


sl 


(Px 
ea rack 0 
/] T 


Figure 19-1. Sectors and Tracks of a Disk 


varies from one hard disk to another. Knowing the number of cylinders makes it possible 
to calculate the total capacity of the hard disk. The total capacity of a disk is calculated as 
follows: 


number of tracks = number of cylinders x tracks per cylinder 
HD capacity = number of tracks x sectors per track x sector density 


See Example 19-1. 


The sectors of a disk are grouped into clusters. Cluster size varies among disk 
sizes, but it is always a power of 2 (2, 4, 8, 16, 32, ...) sectors per cluster. The file alloca- 
tion table, or FAT, keeps track of what clusters are used to store which files. 


Formatting disks 
Once a floppy disk is formatted, the computer can read from or write to that disk. 
Formatting organizes the sectors and tracks in a way that makes it possible for the disk 
controller to access the information on the disk. When a disk is formatted, a number of 
sectors are set aside for various functions and the remaining sectors are used to store the 
user's files. The formatting process sets aside a specific number of sectors for the boot 
record, directory, and FAT (file allocation table), each of which is explained in detail 
below. It also copies some system files onto the disk if it was formatted with the "/s" 
SS 


CHAPTER 19: HARD DISKS 493 


™ Svetim Intarmintinn 


File Edit view Tools Help 


| System Summary “Item Value 
| i Hardware Resources _ Description Disk drive 
= Components | Manufacturer (Standard disk drives] 
Multimedia |, Model 794021134 
CD-ROM || Bytes/Sector 512 
|| Media Loaded Yes 
|) Media Type Fixed&#x0009;hard disk media 
- | Partitions 1 
infrared i SCSI Bus 
Œ- Input |) SCSI Logical Unit 
Modem |) SCSI Port 0 
Network 1! SCS! Target ID 0 
pate | Sectors/Track 63 i 
| Size 37.26 GB (40,007,761,920 bytes) f > 
~| Total Cylinders 4,864 
Total Sectors 78,140,160 
E | Total Tracks 1.240320 
SCSI | Tracks/Cplinder 255 
DE o Partition Disk #0, Partition #0 
~~ Printing || Partition Size 37.25 GB (39,999,504 384 bytes) 


á mAN NRO- || Partition Starting Offset 32,256 bytes 


Sound Device 
Ba ou = || 


Display 


=|. Storage 
Drives 


| 
| 
| 
| 
| 
| 
| 
| 


Paracas AAA PANNAAN N NANESENA ANNADSA ANASA EEANN AAS Saar earttedeten hasten reves tiemramorttsipemsmmeretdrpmsnctensasearectitperterteterettymetenssnsasstsiy tasers nean ipianenaiinaigasadsananaeararrarppaappaosgaaanearinriiaanain eanan nerona naanenea renis i 


Find what: | Find 


[C] Search selected category only  [_] Search category names only 


Figure 19-2. Screenshot of a Hard Disk in x86 Taken from System Information 


Example 19-1 


Verify the capacity of the hard disk using the data in Figure 19-2. 
Solution: 


As shown in Figure 19-2, the hard disk has 4864 cylinders, 255 tracks per cylinder, 63 sectors per 
track, and 512 bytes per sector: 

Total tracks = 4864 cylinders x 255 tracks per cylinder = 1,240,320 tracks 

Total sectors = 1,240,320 tracks x 63 sectors per track = 78,140,160 sectors 

Total bytes = 78,140,160 sectors x 512 bytes per sector = 40,007,761,920 bytes 


Note that in Figure 19-2 the capacity of the drive is expressed as 37.26 GB, where GB = 239, 


option, which makes it a bootable disk. The difference between bootable and nonbootable 
disks will be explained later. 


Disk organization 


Regardless of the type of disk that is used, the first sector of the disk (side 0, track 
0, sector 0) is always assigned to hold the boot record; then some sectors are used for stor- 
age of the FAT (file allocation table) copies 1 and 2. The number of sectors set aside for 
FAT depends upon the disk density. After the FAT, the directory is stored in consecutive 
sectors. Again, the number of sectors used by the directory depends on disk density. In 
assigning sectors for the FAT and the directory, the operating system uses all the sectors 


ee o 
494 


of track 0 of side 0, then goes to side 1 and uses all the sectors of track 0 of side 1, then 
comes back to side 0 and uses track 1, then goes to side 1 of track 1, and so on. See Figure 
19-2. Before moving to the next topic it should be noted that the number of sectors 
assigned for the boot record, FAT, and directory are fixed for a given kind of disk and 
operating system version and it is only after assigning these sectors to those essential func- 
tions that the OS uses the remaining sectors to store files. The number of sectors set aside 
for each of the above can be calculated from the information in the boot record. The boot 
record, FAT, and directory are explained in detail next. 


Looking into the boot record 
When a disk is formatted, the first sector is used for the boot record. It is from the 


boot record that the computer will know the disk type, sector density, total number of sec- 
tors in the disk, and other essential information needed by BIOS and the operating system. 


To examine the boot sector use the WinHex utility 
available from many web sites. Google WinHex for 
available sites. 


Bootable and nonbootable disks 


If the disk is formatted as a system disk (bootable), the first two files are IO.SYS 
and MSDOS.SYS, which are followed by COMMAND.COM. The first two are hidden 
files; therefore, they will not be listed when DIR is used. However, if the disk is format- 
ted as a nonbootable disk, it will not have those three files on it after it is formatted. The 
job of IO.SYS is to provide low-level (hardware) communication (interface) between 
BIOS and DOS. The high-level (software) interface is provided by the MSDOS.SYS file. 
This is the section of DOS that contains INT 21H, among other things. Among the func- 
tions of COMMAND.COM is to provide the DOS prompt ">", read, interpret, and exe- 
cute commands typed in by the user. The SYS command can be used to copy these files 
to a nonbootable disk to make it bootable. 


FAT (file allocation table) 


If the boot record tells BIOS and the operating system the kind of disk, and the 
directory provides the lists of all the files contained on the disk, how does the operating 
system locate a given file? Does it check every one of the hundreds of sectors to see if 
the file is there? This would obviously take an inordinate amount of time. It is the func- 
tion of the FAT to provide a road map for the operating system to find where each file is 
located. In fact, the FAT is so critical to the operating system's ability to locate files that 
two copies of the FAT are kept on the disk, one for use and another one for backup in case 
something happens to the first one. If both are damaged, the operating system cannot find 
any file on that disk. The FAT is always located in the sectors following the boot record 
sector. The number of sectors used by the FAT varies depending on the size and density 
of the disk. 


Partitioning 


Partitioning the disk is the process of dividing the hard disk into many smaller 
disks. For example, a given hard disk of 80 gigabytes capacity can be partitioned into 
three smaller logical disks. They are called /ogicai disks since it is the same physical disk, 
but as far as the operating system is concerned, it will be labeled disks C, D, and E. 

After the hard disk has been partitioned, high-level formatting should be per- 
formed next. The C drive must be formatted with the system option (FORMAT C: /S) so 
that the system can boot from drive C. 


ETE e ee eee renee EEO 
CHAPTER 19: HARD DISKS 495 


Clusters 


In the x86 IBM PC, the sector size is always 512 bytes but the size of the cluster 
varies among disks of various sizes. The cluster size is always a power of 2: 2, 4, 8, and 
so on. The fact that a 1-byte file takes a minimum of one cluster is important and must be 
emphasized. This means that a number of small files on a disk with a large number of sec- 
tors per cluster will result in wasted space on the hard disk. Let's look at an example. In a 
hard disk with a cluster size of 16 sectors (16 x 512 = 8192 bytes), storing a file of 26,000 
bytes requires 4 clusters. The result is a waste of 6768 bytes since 4 x 8192 = 32,768 
bytes, and 32,768 — 26,000 bytes = 6768. One can use the WinHex utility to find the clus- 
ter size. 


Speed of the hard disk 


One of the most important and widely cited hard disk performance factors is its 
speed, or how fast the requested data is available to the user. The hard disk access time is 
in the range of 10-80 ms and is still dropping. This access time is much longer than that 
of primary DRAM memory, which is in the range of 50-250 ns and lower, which is thè 
reason that disk caching is used, as will be discussed later in this chapter. The access time 
of the hard disk given by manufacturers is broken down into several smaller times indi- 
cating the speed of different sections of the hard disk. The components of access time are 
seek time, settling time, and latency time. Seek time is the amount of time that the 
read/write head takes to find the desired cylinder or track. The outer tracks (the outer 
cylinder) obviously take less time to find since the head is parked on the outermost track 
and moves into inner tracks. Manufacturers always give the average seek time in the data 
sheet. To reduce seek time, computers use hard disks with several heads parked at differ- 
ent cylinders (tracks), which reduces seek time drastically but also increases the cost. 

Settling time, the second factor in access time, is the time it takes the head to stop 
vibrating before it can begin reading the data. Some manufacturers include settling time 
when they give the average seek time. 

Rotational latency is the time it takes for the head to locate on the specific sector. 
In other words, after the head is settled, the platter is rotating at a certain RPM (revolu- 
tions per minute) rate. Rotational latency time depends on the distance between the head 
and the desired sector, but in no case is it more than the time for one revolution. This 
means that rotational latency is directly proportional to the RPM of the hard disk. The 
RPM for various disks varies between 2,400 and 15,000 (and as high as 18,000 in some 
recent ones). Again, the average rotational latency must be taken into consideration. For 
example, if a given disk has 3600 RPM, which is 60 rotations per second, this is 16.6 ms 
(1/60 = 16.6 ms) for each full revolution. Since the desired sector could be directly 
beneath the head or at the end of the track, the average latency due to rotation is 8.3 ms 
[16.6 + 0) / 2 = 8.3]. 


Interfacing standards in the hard disk 


To ensure that hard disks made by different manufacturers are compatible, com- 
mon standards for interfacing the hard disk and personal computers have been devised. 
These standards are ESDI and SCSI, which are explained below. 


ESDI (enhanced small device interface) 


The ESDI standard was developed by a group of disk drive manufacturers in 
1983. There are some differences between ESDI and ST412: 


1. ESDI can achieve a data transfer rate of up to 20 Mbits per second in contrast to 
the 7.5 Mbits/second of the ST412. 

2. With the same RPM as ST412, it can have more sectors per track. The number of 
sectors for ESDI can vary between the 20s and the 50s. 

3. While in ST412 the defect information must be provided manually during low-level 
formatting, for ESDI the defect map is already stored on the drive. 


—————————— 
496 


4. Inthe ST412 standard, the number of cylinders, heads, and sectors is stored either 
in the CMOS RAM of the system or in the ROM of the hard disk controller, in 
contrast to ESDI, where the configuration information is already provided and there 
is no need to store it externally. 


IDE (integrated device electronics) 


IDE is the standard for current PCs. In IDE, the controller is part of the hard disk. 
In other words, there is no longer a need to buy a hard disk and a separate controller as is 
often the case for ST412. One of the reasons that the IDE drives have a better data trans- 
fer rate is the integration of many of the controller's functions into the drive itself with the 
use of VLSI chips. For example, in the ST412 standard the hard disk read/write heads 
would read the data and transfer it to the controller through the cable, and then the data is 
separated from the clock pulses by what is called data separator circuitry. By eliminating 
cable degradation, IDE and SCSI (discussed next) reach a much higher external data 
transfer rate. 


SCSI (small computer system interface) 


SCSI (pronounced "scuzzy") is one of the most widely used interface standards 
not only for high-performance IBMs and compatibles but also for non-80x86 computers 
by other manufacturers, such as Apple and Sun Micro. The main reason is that unlike IDE, 
SCSI is the standard for all kinds of peripheral devices, not just hard disks. One can daisy 
chain up to seven devices, such as CD-ROMs, optical disk, tape drives, floppy disk drives, 
networks, and other I/O devices, using the SCSI standard. See Figure 19-3. 

All the characteristics discussed for ESDI and IDE apply equally to SCSI. In addi- 
tion, SCSI can have an internal data transfer rate of up to 80 Mbits/second. It must be 
noted that SCSI hard drives always have the controllers embedded into them and there is 
no need for a separate controller. The only thing needed is an adapter to convert the SCSI 
signals to signals compatible with the bus expansion slot of the host computers. 


Peripheral 1 Peripheral 2| | Peripheral 3} | Peripheral 4| | Peripheral 5| | Peripheral 6| | Peripheral 7 


SCSI Host Adapter 


Main Computer Buses 


Figure 19-3. Peripheral Devices in SCSI “Daisy Chain” 


interleaving 


As the read/write head moves along the track, it must read each sector and pass it 
to the controller. The controller in turn will deliver this data to the host computer through 
the buses. If the head and the controller cannot keep up with the stream of data passing 
under the head, there are two choices: either the rotation should be slower or interleaving 
should be used. Using a slower rotation, say, 600 RPM instead of 3600, will give an unac- 
ceptably long access time. That brings us to interleaving. While common sense tells us 
that the sectors should be numbered on each track sequentially, the head and controller 
cannot process the data in sector 1 in time to be ready for sector 2 by the time it is under 


o S O O O O, 
CHAPTER 19: HARD DISKS 497 


Figure 19-4. Hard Disk Interleaving 


the head, so it would have to wait for the next rotation to read sector 2. Likewise, while 
sector 2 is being processed, sector 3 has already passed under the head, so to read that sec- 
tor it must wait for the next revolution. This means that reading all 17 sectors of each track 
will take 17 revolutions. This is 1:1 interleaving and is as bad as slowing down the RPM. 
In 2:1 interleaving the sectors are numbered and accessed alternately. If the controller is 
not fast enough, 3:1 can be used. In 3:1 interleaving, every third sector is numbered and 
accessed. It will take 2 complete revolutions to access all the sectors in 2:1, and 3 revolu- 
tions for 3:1 interleaving. This is much better than all the other choices discussed above. 
In 3:1 interleaving, the computer accesses sector 1, and by the time it finishes processing 
it, sector 2 is under the head. The two sectors between sectors 1 and 2 give the controller 
time to get ready for accessing the next sector. Note that today's high-performance com- 
puters using IDE and SCSI controllers and 1:1 interleaving can read the entire track with 
one revolution, due to their fast controllers and wide data buses. Figure 19-4 shows the 
concept of interleaving. 


Disk caching 


Due to the long access time of the hard disk, disk caching is used to speed up the 
disk access time. There are two types of disk caching. In one type, the disk manufacturer 
puts some (several megabytes) fast memory on the disk. This is called hardware disk 
cache (see Table 19-1). In the other type, a section of memory on the PC motherboard is 
set aside for disk caching. Obviously, the larger the size of this memory, the more files can 
be stored there and accessed by the CPU, assuming that there is extra memory to spare. 
Using a section of motherboard DRAM for disk caching is done by the operating system. 
This kind of disk caching is called software disk cache. 


ee 
498 


Table 19-1: Seagate LD25 Series Disk Drive Datasheet Data 


Specifications 40 GB 20 GB 
GB = 1 billion GB = 1 billion 
ST9402115AS ST920217AS 
ST9402115A ST920217A 
SATA/150 SATA/150 
Ultra ATA/100 Ultra ATA/100 


Model Number 


Interface Options 


Performance 

Spindle Speed (RPM) 

Average Latency (mse c) 

Seek Time 

Average Read/Write (msec) 
Transfer Rate 

Maximum Internal (Mbytes/sec) 
Maximum Sustained (Mbytes/sec) 


Nonrecoverable Read Errors per Bits 


Read 

Power Management 

Startup current (SV, typical Amps) 
Seek (typical, watts) 

Idle Average (watts) 

Standby Average (watts) 
Environmental 

Temperature, Operating (°C) 
Temperature, Nonoperating (°C) 
Shock, Operating: 2 msec (Gs) 
Shock, Nonoperating: 1 msec (Gs) 


5400 
5.6 


<16 


DLO 
150/100 


1 
2.9 
2 

<S 


SOS 
—40 to 70 
100 
500 


5400 
5.6 


<16 


5726 
150/100 


Sao'55 
—40 to 70 
100 
500 


Disk reliability 


MTBF (mean time between failures) is a measure of reliability and durability of 
the disk when the power is on. This factor is given in hours. For example, the ST225 has 
a MTBF of 100,000 hours. Dividing it by 24 hours gives an MTBF value of 4166.6 days 
or 11.4 years (4116.6/365). Of course, manufacturers will not power on the disk for that 
long and then test it since they would be out of business by then. They use extremely reli- 
able statistical analysis to figure out the MTBF. 


Ee rere eee ere e eee e eee eee en 


CHAPTER 19: HARD DISKS 499 


Review Questions 


li 


True or false. All sectors have the same capacity (total number of bytes that can be 


stored). 

2. The very first sector always belongs to the (FAT, boot record). 

3. The sectors belonging to the (FAT, directory) are located next to the 
boot sector. 

4. Why are there two copies for each FAT? 

5. True or false. The number of sectors set aside for the FAT varies among the vari- 
ous-sized disks. 

6. The boot record provides the total (byte capacity, number 
of sectors) per disk. 

7. How does the operating system know how many sectors each track is divided into? 

8. Sectors set aside for directories are always (before, after) the FAT sector. 

9. In hard disks, which sector is set aside for the boot record? 

10. True or false. The SCSI interfacing standard is also used for devices other than 
hard disks, such as CD ROM. 

11. How many rotations does it take to read all the sectors of a given track in a 3: 1 
interleaved hard disk? 

12. True or false. Each file begins on a new cluster even if the previous cluster has 
some empty space. 

13. How many sectors are in a cluster? 

14. What does "MTBF" stand for, and what does it measure? 

PROBLEMS 


SECTION 19.1: HARD DISK ORGANIZATION AND PERFORMANCE 


SI ES T EN pIo a D 


a 
SS 


2. 


500 


Why is the hard disk called secondary storage? 

The first sector of every disk is set aside for , 

After the boot sector, sectors are assigned to the (FAT, directory). 

For a disk to be bootable, which file must it contain? 

Why doesn't the DIR command list MSDOS.SYS and IO.SYS on the screen? 

True or false. The terms hard disk and fixed disk refer to the same thing. 

What is a cylinder, and how is used in the hard disk? 

The total number of tracks in a given hard disk is equal to 

Calculate the total number of sectors and the capacity of the hard disk on your com- 
puter. 


. True or false. The number of sectors per cluster is always a power of 2. 
. Discuss seek time, settling time, latency time, and how they relate to disk access time. 


Which of the following has the shortest access time? Which has the longest access 
time? 


(a) 2400RPM  (b)3600RPM (c) 4800 RPM 


. True or false. The SCSI interfacing standard is used only for hard disks. 
. State the number of peripheral devices that can be daisy chained if SCSI is used. 
. To read all the sectors of a given track, how many times must the track rotate under 


the head for each of the following interleaving factors? 
(a) 1:1 (b) 1:3 (c) Iss 


. What does MTBF stand for, and what is its use? 


ANSWERS TO REVIEW QUESTIONS 


SECTION 19.1: HARD DISK ORGANIZATION AND PERFORMANCE 


1. “Tue 

2. Boot record 

3. PAT 

4. To use the second for backup in case something happens to the first one, since the FAT 


is the road map for finding where data is located on the disk. 
True 


5 
6. Number of sectors 

7. This information is provided in the boot sector. 

8. After 

9. Sector 0 

10. True 

11. 3 rotations 

L2. Ere 

13. 1, 2, 4, 8, 16, 32, or 64; it is always a power of 2. 

14. Mean time between failures; it is a measure of disk longevity and reliability. 


eee e ee een 


CHAPTER 19: HARD DISKS 501 


502 


CHAPTER 20 


THE IEEE FLOATING POINT 
AND x87 MATH PROCESSORS 


OBJECTIVES 


Upon completion of this chapter, you will be able to: 


>> Diagram the bit assignment of floating-point IEEE standards for 
single- and double-precision data 

>> Convert data from real numbers to IEEE floating-point format 

>> Diagram the bit assignment for x87 data types: word integer, short 
integer, long integer, packed decimal, short real, long real, and temporary 
real 
Code Assembly language data directives for x87 data types 
List the registers of the x87 
Write Assembly language programs using x87 instructions 


503 


This chapter will examine the x87 math coprocessor. In Section 20.1, we study 
the IEEE standard for floating-point numbers and the Intel x87 math coprocessor's data 
format. In Section 20.2, the x87 instructions are discussed along with some sample pro- 
grams using MASM. In Section 20.3, we provide an overview of x87 instructions. 


SECTION 20.1: MATH COPROCESSOR AND IEEE FLOAT- 
ING-POINT STANDARDS 


Using a general-purpose microprocessor such as the 8088/86 to perform mathe- 
matical functions such as log, sine, and others is very time consuming, not only for the 
CPU but also for programmers writing such programs. In the absence of a math coproces- 
sor, programmers must write subroutines using 8088/86 instructions for mathematical 
functions. Although some of these subroutines are already written and can be purchased 
at a small cost, no matter how good the subroutine, its CPU run time (8088/86, 286, 386) 
will still be quite long. Table 20-1 provides a comparison of the number of clocks used by 
the 8087 and 8086 to perform some mathematical functions. One can appreciate the 
advantage of having a coprocessor by comparing the run time of some programs, such as 
SPICE (a package for circuit analysis which uses floating-point operations extensively), 
on a computer with a coprocessor on one without a coprocessor. In some cases the differ- 
ence is hours. 


Table 20-1: Comparison of 8087 and 8086 Clock Times 


Approximate Execution Time (us) (5-MHz clock 


8087 4046 Emulation 


ye cS 
Multiply (double precision 
[Compare CT T a 
Load (single precision) o | o ao T y y ONC 


(Reprinted by permission of intel Corporation , Copyright Intel Corp. 1989) 


IEEE floating-point standard 


Up to the late 1970s, real numbers (numbers with decimal points) were represent- 
ed differently in binary form by different computer manufacturers. This made many pro- 
grams incompatible for different machines. In 1980, an IEEE committee standardized the 
floating-point data representation of real numbers. This standard, much of which was 
contributed by Intel based on the 8087 math coprocessor, recognized the need for differ- 
ent degrees of precision by different applications; therefore, it established single precision 
and double precision. Since almost all software and hardware companies, including IBM, 
Intel, and Microsoft, now abide by these standards, each one is explained thoroughly. 
RISC processors also use IEEE floating-point standards. 


IEEE single-precision floating-point numbers 


IEEE single-precision floating-point numbers use only 32 bits of data to represent 
any real number in the range 2!28 to 2-126, for both positive and negative numbers. This 
translates approximately to a range of 1.2 x 10-38 to 3.4 x 10*38 in decimal numbers, again 


ee 
504 


for both positive and negative values. In Intel coprocessor terminology, these single-pre- 
cision 32-bit floating-point numbers are referred to as short real. Assignment of the 32 bits 
in the single-precision format is 


Bit Assignment 

31 Sign bit: 0 for positive (+) and 1 for negative (—) 
23-30 Biased exponent 

22-0 The fraction, also called significand 


To make the hardware design of the math processors much easier and less transis- 
tor consuming, the exponent part is added to a constant of 7FH (127 decimal). This is 
referred to as a biased exponent. Conversion from real to floating point involves the fol- 
lowing steps. 


The real number is converted to its binary form. 

The binary number is represented in scientific form: 1.xxxx E yyyy 

Bit 31 is either 0 for positive or | for negative. 

The exponent portion, yyyy, is added to 7F to get the biased exponent, which is 
placed in bits 23 to 30. 

5. The significand, xxxx, is placed in bits 22 to 0. 


S ae 


Examples 20-1, 20-2, and 20-3 demonstrate this process. In Section 20.2 we will 
verify all the above examples using an assembler. 


Example 20-1 


Convert 9.75, to single-precision (short real) floating point. 
Solution: 

decimal 9.75 = binary 1001.11 = scientific binary 1.00111 E 3 
Sign bit 31 is 0 for positive. 

Exponent bits 30 to 23 are 1000 0010 (3 + 7F = 82H) after biasing. 
Significand bits 22 to 0 are 001110000000000000000 ...00. 


Putting it all together gives the following binary form, under which is written the hex form: 


0100 0001 0001 1100 0000 0000 0000 0000 
4 l 1 C 0 0 0 0 


This can be verified by using an assembler, such as MASM, as will be seen later in this chapter. 


IEEE double-precision floating-point numbers 


Double-precision FP (called /ong real by Intel) can represent numbers in the range 
2.3 x 10-38 to 1.7 x 10308, both positive and negative. A total of 53 bits (bits 0 to 52) are 
for the significand, 11 bits (bits 53 to 62) are for the exponent, and finally, bit 63 is for the 
sign. The conversion process is the same as for single precision in that the real number 
must first be represented as 1.xxxxxxx E YYYY, then YYYY is added to 3FF to get the 
biased exponent. 


Other data formats of the 8087 


In addition to short real (single-precision) and long real (double-precision) repre- 
sentations for real numbers, the 8087 also supports 16-, 32-, and 64-bit integers. They are 
referred to as word integers, short integers, and long integers, respectively, and are shown 
in Figure 20-1. These forms are sometimes referred to as signed integer numbers. No dec- 
imal points are allowed in integers, in contrast to real numbers, in which decimal points 
are allowed. There are also two 80-bit data formats in the 8087 coprocessor, packed dec- 
imal and temporary real. The packed decimal format has 18 packed BCD numbers, which 


Henne eee ee eee SSL 
CHAPTER 20: THE IEEE FLOATING POINT AND x87 MATH PROCESSORS 505 


require a total of 72 bits (18 x 4 = 72). Bits 71 to 0 are used for the numbers, bits 73 to 
78 are always 0, and bit 79 is for the sign. The temporary real format is used internally by 
the 8087 and is shown in Figure 20-1. In the temporary real format, the conversion goes 
through the same process as shown above, except that the biased exponent is calculated 
by adding the constant 3FFFH. 


Example 20-2 


Convert 0.078125; to short IEEE floating-point standard real FP (single precision). 
Solution: 

decimal 0.078125 = binary 0.000101 = scientific binary 1.01 E-4 

Sign bit 31 is 0 for positive. 

Exponent bits 30-23 are 0111 1011 (4 + 7F = 7B) after biasing. 

Significand bits 22—0 are 01000000....000. 

This number will be represented in binary and hex as 


0011 1101 1010 0000 0000 0000 0000 0000 
3 D A 0 0 0 0 0 


Example 20-3 


Convert —96.27,, to single-precision FP format. 

Solution: 

decimal 96.27 = binary 1100000.01000101000111101 = 
scientific binary 1.10000001000101000111101E 6 

Sign bit 31 is 1 for negative. 

Exponent bits 30-23 are 1000 0101 (6 + 7F = 85H) after biasing, 
Fraction bits 22—0 are 10000001000101000111101, 

The final form in binary and hex is 


1100 0010 1100 0000 1000 1010 0011 1101 
Q 2 G 0 8 A 3 D 


It must be noted that conversion of the decimal portion 0.27 to binary can be continued beyond the 
point shown above, but because the fraction part of the single precision is limited to 23 bits, this was 
all that was shown. For that reason, double-precision FP numbers are used in some 

applications to achieve a higher degree of accuracy. 


Example 20-4 


Convert 152.1875, to double-precision FP. 

Solution: 

decimal 152.1875 = binary 10011000.0011 = 

scientific binary 1.00110000011 E 7 

Bit 63 is 0 for positive. 

Exponent bits 62-53 are 10000000110 (7 + 3FF = 406) after biasing. 
Fraction bits 52-0 are 00110000011000.....000. 


0100 0000 0110 0011 0000 0110 0000 0000 0000 
4 0 6 3 0 6 0 0 0 


This example will be verified by an assembler in the next section. 


506 


Word Integer approx. range: -32768 <= x <= +32767 
15 0 


9 9 
Short Integer approx. range: -2 x 10 <= x <= +2 x 10 
31 


s [magnitude 


18 18 
Long Integer approx. range: -9 x 10 <= x <= +9 x 10 
63 0 


B magnitude | 


Packed Decimal approx. range: -99..99 <= x <= +99..99 


79 72 
magnitude:d17 to dO 


© 


-38 38 
Short Real approx. range: 0, 1.2 x 10° <=|x|<= + 3.4x 10 
31 23 22 


-308 308 
Long Real approx. range: 0, 2.3 x 10 <=) <= + 1.7x 10 
63 52 51 0 


significand 


-4932 4932 
Temporary Real approx. range: 0, 3.4x 10 <=[x|<= + 1.1x 10 
79 64 63 62 
[s[bexo |i] b.exp ai significand | 


Figure 20-1. x87 Data Format 
Review Questions 


1. True or false. In the absence of a math processor, the general-purpose processor 
must perform all math calculations. 
2. True or false. The x87 follows the IEEE floating-point standard. 


een re eee ene Ke 
CHAPTER 20: THE IEEE FLOATING POINT AND x87 MATH PROCESSORS 507 


3. Single-precision IEEE FP standard uses bits to represent data. 

Double-precision IEEE FP standard uses bits to represent data. 

5. To get the biased exponent portion of IEEE single-precision floating-point data we 
add 

6. To get the biased exponent portion of IEEE double-precision floating-point data we 
add 


> 


SECTION 20.2: x87 INSTRUCTIONS AND PROGRAMMING 


This section examines the x87 registers and instructions and provides some sam- 
ple programs. 


Assembling and running x87 programs en the x86 PCs 


The math coprocessors found in x86 (486, Pentium I-IV, and Itanium) processors 
have their origin in the 8087 coprocessor chip. As far as data types and instructions are 
concerned, there have been few changes since the introduction of the original 8087, aside 
from the fact that x86 processors run 8087 instructions much faster. Originally, the 8087, 
80287, and 80387 math coprocessors were separate chips. In IBM PC and Compatible 
computer motherboards there was a socket for these coprocessors right next to the 8088, 
80286, and 80386 microprocessors. Starting with the 80486, the coprocessor was inte- 
grated with the main processor into a single chip. This section shows how to assemble 
several 8087 programs using the Microsoft Assembler, MASM, run them on a PC, and 
analyze the result. First, the assembler directives for data types of the 8087 are explained. 
In MASM and compatible assemblers, there are different directives to define the different 
data types of the coprocessor. They are as follows: 


DD (Define double word) for short real (single precision) 
DQ (Define quad word) for long real (double precision) 
DD (Define double word) for short integer 

DQ (Define quad word) for long integer 

DT (Define ten bytes) for packed decimal 

DT (Define ten bytes) for temporary real 


Recall that the word size in the x86 family is 16 bits. Therefore, when using DD 
to define a double word, the result is 32 bits. This is different from some other processors, 
notably RISC processors, in which a word is defined as 32 bits. It is worth repeating a 
point made in Chapter 0: Although a byte is defined as 8 bits universally, a word is defined 
differently by different companies. For example, the Cray computer defines a word as 64 
bits. 


Verifying the solution for Examples 20-1 to 20-4 


Program 20-1 is a portion of the .LST file produced when a program is assembled. 
It verifies the conversion from decimal to the internal machine representation given in 
Examples 20-1 through 20-4. 


x87 registers 


There are only eight general-purpose registers in the x87. Rather than having dif- 
ferent-size registers for different-size operands, all the registers of the 8087 are 80 bits 
wide. Every time the 8087 loads an operand, it automatically converts it to this 80-bit for- 
mat. This gives uniformity to the registers and makes programming, as well as 8087 hard- 
ware design, much easier. Although these eight registers have been numbered from 0 to 7 i 
they are accessed like a stack, meaning that a last-in-first-out policy is used. At any given 
time, the top of the stack is referred to as ST(0), or simply ST, and all other registers, 
regardless of their number, are referred to according to their positions compared to the top 
of the stack, ST. The programming examples below will demonstrate the use of registers 


eee 
508 


8087 
PAGE 60,132 


ORG OOH 

COMORE 1 EXT ;example 
ORG 10H 

00 00 AO 3DEX2 onome i25 ; example 
ORG 20H 

31D) BA CO Ce EX3 SN 52 ; example 
ORG 30H 

E 00 00 06 EX4 Wo ESTO ;example 


Example 20-1 data: 
hex: 41 1C 00 00 
binary: 0100 0001 0001 1100 0000 0000 0000 0000 
sign: 0 for positive 
biased exp: 1000 0010 normalize: 82 — 7F = 3 
significand: 0011100..00 
scientific binary: 1.00111000..00 E3 = 1001.11000...00 
decimal: ONS 


Example 20-2 data: 
nex; 3D AO 00 00 
binary: 0011 1101 1010 0000 0000 0000 0000 0000 
sign: 0 for positive 
biased exp: 0111 1011 normalize: 7B ~ 7F = —4 
significand: 01000.00 
scientific binary: 1.01 E—4 = .000101 
decimal: 0.078125 


C2 C0 8A 3D 
1100 0010 1100 0000 1000 1010 0011 1101 
1 for negative 
biased exp: 1000 0101 normalize: 85 — 7F = 6 
significand: 10000001000101000111101 
scientific binary: 1.10000001000101000111101 E6 = 1100000.001000101000111101 
decimal: —96.2700078 


Example 20-4 data: 
hex: 40 63 06 00 00 00 00 00 
binary: 0106 0000 0110 0011 0000 0110 000..00 
sign: 0 for positive 
biased exp: 10000000110 normalize: 406 — 3FF =7 
significand: 00110000011 
scientific binary: 1.00110000011 E7= 10011000.0011 
decimal: 152.1875 


Program 20-1 


ŘS 
CHAPTER 20: THE IEEE FLOATING POINT AND x87 MATH PROCESSORS 509 


in the 8087. Example 20-5 will show a complete Assembly language program using the 
8087 coprocessor. First, a few points should be noted: 


1. All x87 mnemonics start with the letter "f" to distinguish them from x86 instruc- 


tions. 

2. The x87 must be initialized to make sure that the top of the stack will be register 
number 7. 

3. Whenever a register is not identified specifically, ST [which is ST(0)] is assumed 
automatically. 


4. ST(0) is the top of the stack, ST(1) is one register below that, and ST(2) is two reg- 
isters below ST(0), and so on. In other words, for register ST(m), the number in 
parentheses, m, has nothing to do with the register number. There is a way to find 
out which register number, 0-7, is ST(0), the top of the stack. 

5. In the following programming examples, all values of X, Y, and Z have been 
defined in the data segment and allocated memory locations. The same is true for | 
variables, such as SUM, for storing the result. 


Example 20-5 


Write an 8087 program that loads three values for X, Y, and Z, adds them, and stores the result. 
Solution: 


IE SLINALIE ;initialize the 8087 to start at the top of stack 
EA X oaa XNE STO). NOW SD(OISX 

miid Y load Y into ST(0). now ST(0)=Y and ST(1)=x 

fra Z pllieyetol 74 atiancioy ST(0) | inkony (Sur (0) =, Sv (DESE 
faddi ST (aL) ;add Y to Z and save the result in ST(0) 
lacie!  /SHE (2) add X to (rz) and save it in ST(0) 

fst sum ;store ST(0) in memory location called sum. 


Now the same program can be written as follows: 


Ernie 

aiel X Load xe, now Su) = x 

Zell X Load y, now ST(0)= y, Rommi 

fld A sload 27 NOW ST(O)=2, STES NO) E 
fadd PEGS y ico) = 

fadam ST(2) Petslels: o< ie) (G7 E) 

fst sum 


Program 20-2 shows the actual MASM code and execution. Figure 20-2 shows the registers. 


Often, an application requires the use of both real and integer numbers. Real 
numbers can be rounded into integers by using the x87 instruction FRNDINT, as shown 
in Program 20-3, which includes a procedure to round real numbers. The sample data used 
is the real number 5.5. In addition, the data is analyzed to see how the number was round- 
ed. FRNDINT rounds real numbers to integers by rounding up, rounding down, truncat- 
ing, or rounding to the nearest integer. How real numbers are rounded is determined by 
the RC (round control) bits in the control word (see Figure 20-3). 


eee 
510 


;Program for Example 20-5 to load 3 numbers an 
nou? 


Pace 60,132 
-MODEL SMALL 
TS LACK 


d compute their sum 


13709375 


29203901625 


AX, @DATA 
MOV DS, AX 
CSUM 


*inatialize 8087 isittaek 


FLD X ;load X into ST(0) 
FLD Ne load Y into {SF (10) 
FLD Z ‘loaa z into ST(0) 
EDD ST/ST 1) SWO = a sp A 

HADDE SE (0T ST (2) SSON E K ak (M6 se A 


sstore ST(0) in sum 


EX5.EXE 


B8B617 MOV 
eer 20003 8ED8 MOV DS, AX 
16EF:0005 E80400 CALL 000C 
16EF:0008 B44C MOV AH, 4C 
16EF:000A CD21 INT 2p 
16EF:000C 9B WAIT 


Program terminated normally 
SDRE: 0 3E 

izeec0000 00 00 WC 41 00 00 COMGG-—00 00 00 00 00 00 00 00 ...A.... 26sec ene 
17B6:0010 00 80 51 41 00 00 00 00-00 00 00 00 00 00 00 00 ..QA...........-. 
17B6:0020 00 50 ES 41 00 00 00 00-00 00 00 00 00 00 00 00 .PhHA............ 
17B6:0030 00 88 4F 42 00 00 00 00-00 00 00 00 00 00 00 00 ..@............. 
=e. 
CEN 


Program 20-2 
e 
CHAPTER 20: THE IEEE FLOATING POINT AND x87 MATH PROCESSORS 511 


; 


(a) FINIT 


(e) FADD ST(0),ST(1) 


(f) FADD ST(0),ST(2) 


Figure 20-2. Stack Diagram for Example 20-5 


hex: 
sign: 


significand: 


sci. binary: 
decimal: 


hex: 
sign: 


significand: 


sci. binary: 
decimal: 


hex: 
sign: 


significand: 


sci. binary: 
decimal: 


SUM:hex: 


312 


sign: 


significand: 


sci. binary: 
decimal: 


411C 0000 binary: 0100 0001 0001 1100 0000 0000 0000 0000 
0 for positive biased exp: 1000 0010 normalize: 82 — 7F = 3 
0011 1000 00..00 

1.00111000..00 E3 = 1001.11000.00 

9.75 


41 51 80 00 

0 for positive 
1010001100..00 
1.1010001100..00 E3 = 1101.00011 
13109375 


binary: 0100 0001 0101 0001 1000 0000 0000 0000 
biased exp: 1000 0010 normalize: 82 — 7F = 3 


41 E8 5000 binary: 0100 0001 1110 1000 0101 0000 0000 0000 
0 for positive biased exp: 1000 0011 normalize: 83 — 7F = 4 
1101000010100..00 

1.1101000010100.00 E4 = 11101.0000101 

20.0390625 


42 4F 88 00 
0 for positive biased exp: 
10011111000100..00 
1.100111110001 E5 = 110011.1110001 
51.8828125 


binary: 0100 0010 0100 1111 1000 1000 0000 0000 


1000 0100 normalize: 84 — 7F =5 


Program 20-4 calculates the area of a circle. The 8087 has instructions that load 
the top of the stack, ST(0), with a constant. For example, FLDPI loads PI into ST. To cal- 
culate the square of a number, the register is multiplied by itself with FMUL. FMUL can 
have two operands, such as "FMUL ST(2),ST(4)", where ST(4) is multiplied by ST(2) and 
the result is placed in ST(2). If no operands are given, the operation is assumed to be 
"FMUL ST(0),ST(1)", so that the first two stack registers are multiplied together and the 
result is stored in ST(0). 


~8087 
PAGE 60,132 


-MODEL SMALL 
; PROGRAM TO ROUND A REAL NUMBER TO AN INTEGER 
- STACK 


ORG OOH 
REALNUM DD SMS 
ORG 10H 


PROC FAR 
MOV AX, @DATA 
MOV DS,AX 

RND NUM 


; PROCEDURE TO ROUND A REAL NUMBER TO AN INTEGER 
RND_NUM BROG NEAR 


FINIT ;initialize 8087 
FLD REALNUM ;load real 
FRNDINT ‘round ice) Integer 
PINSI INTNUM ;store integer 
RET 

ENDP 


Program terminated normally 


=a 065:0 1f 
1065:0000 00 00 BO 40 00 00 00 00-00 00 00 00 00 00 00 00..0@............ 
1065:0010 06 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00.......... ee eee, 
-q 

The verification of the above dump is left to the reader as an exercise. It must be noted that the con- 
trol word can be accessed in order to see which rounding method was used. 


Program 20-3 


E 
CHAPTER 20: THE IEEE FLOATING POINT AND x87 MATH PROCESSORS 513 


PSOL 
PAGER OO SZ 
. MODEL SMALL 
; PROGRAM TO CALCULATE AREA OF A CIRCLE (radius 91.67) 


AX, @DATA 
DS, Ax 
CIRC_AREA 
AH, 4CH 


; PROCEDURE TO CALCULATE THE AREA OF A CIRCLE 
CIRC AREA PROC NEAR 
p FINIT initialize 8087 
FLD R ;load radius 
FMUL ST(0) STNO) ; square R 
FLDPI z lgad PI 
Sa (0) 5 Saw (ab) pmultiply Pr by RR squared 
;store AREA 


The data dumped in DEBUG looked as follows: 
sce) LOGSs20) ike 


1065:0000 OA 57 B7 42 00 00 00 00-00 00 00 00 00 00 00 00 
1065:0010 OC 40 CE 46 00 00 00 00-00 00 00 00 00 00 00 00 . @NE 
r = 42 B7 57 0A Hex 
binary = 0100 0010 1011 0111 0101 0111 0000 1010 
sign: 0 for positive 
biased exp: 1000 0101 normalize: 85 — 7F = 6 
significand: 01101110101011100001010 
sci. binary: 1.01101110101011100001010 E6 = 1011011.10101011100001010 
decimal: 91.670248 1689454 


area = 46 CE 40 0C 
binary = 0100 0110 1100 1110 0100 0000 0000 1100 
sign: 0 for positive 
biased exp: 1000 1101 normalize: 8D — 7F = 14 
significand: 10011100100000000001100 


sci. binary: 1.10011100100000000001100 E 14 = 110011100100000.000001100 
decimal: 26400.0232375 


Program 20-4 


ee 
514 


Example 20-6 


Write, run, and analyze an 8087 program to calculate sin, cosine, tan, and cotan of a 30-degree 
angle. 


Solution: 


First the 30-degree angle must be converted to radians: (PI/180) x 30 = 0.523598776 radian. 

The program is Program 20-5. The data dump in DEBUG, after Program 20-5 is run, is as follows: 
=O 1065:0 8E 

NOGS:0000 9200A 06 3F 00 00 00-00 00 

aose0010 15°06 BA 3F 00 00 00-00 

eeGS;0020 45 CD 56 SF 00 00 00-00 

1065:0030 45 CD D6 3F 00 00 00-00 

1065:0040 00 00 00 3F 00 00 00-00 

Ug@es=0050 D8 B3 5D 3F 00 00 00-00 

1OGS-0060 3A CD 13 3F 00 00 00-00 

1065:0070 D8 B3 DD 3F 00 00 00-00 

MONERO IE 2B CO 50 65 10 8E-D8 .+@P8e..Xh. .h#.h 


Data analysis: 


sin = 3F 00 00 00 
binary: 0011 1111 0000 0000 0000 0000 0000 0000 
sign: 0 for positive 
biased exp: 0111 1110 normalize: 7E —- 7F = -1 
significand: 000000..00 


sci. binary: 1.00..00 E-1 = .1 
decimal: 5 


cos = 3F 5D B3 D8 
binary: 0011 1111 0101 1101 1011 0011 1101 1000 
sign: 0 for positive 
biased exp: O11) 11]0mesmalize: 7E — 7F =—1 
significand: 10111011011001111011000 
sci. binary: 1.10111011011001111011000 E-i = .110111011011001111011000 
decimal: .8660254476 


tan = 3F13CD3A 
binary: 0011 1111 0001 0011 1100 1101 0011 1010 
sign: 0 for positive 
biased exp: 0111 1110 normalize: 7E — 7F =—1 
significand: 00100111100110100111010 E-1 
sci. binary: 1.00100111100110100111010 E-1 =.100100111100110100111010 
decimal: .577350259 

cot = 3F DD B3 D8 
binary: 0011 1111 1101 1101 1011 0011 1101 1000 
sign: 0 for positive 
biased exp: 0111 1111 normalize: 7F — 7F = 0 
significand: 10111011011001111011000 EO 
sci. binary: 1.10111011011001111011000 
decimal: 1.732050896 


eee ee reer eee ee ene enn ————————————————EE—=—=== 


CHAPTER 20: THE IEEE FLOATING POINT AND x87 MATH PROCESSORS 515 


.8087 
PAGE 60,132 
.MODEL SMALL 
program we) erilemulenee: SiN (COs, 
sSHUNCE 32 


TAN, 


and COT of a 30-degree angle 


0.523598776 ;angle in radians for 30 degrees 


10H 
0 
20H 
0 
30H 


AX, @DATA 
DSEK 
CALC X Y 
CALC R 
CALC SIN 


;procedure to calculate X and Y given an 


BROCE 
FINIT 


CALE X X NEAR 


ANGLE 


angle 


finitialize 8087 

; load ANGLE onto stack 
;calculate X and Y 
7store X and POP 
7store Y and POP 


;procedure to calculate hypotenuse given X and Y 


CALC_R PROC 
FINIT 
FLD X 
FMUL ST(0),ST(0) 
FLD X% 

FMUL 


NEAR 


SECON So ACO 
SIKON a Sara) 


anica lizenrsgosi 
pload X onto Stack 
;square X 

moddiy onto Siecle 


Peete i 


¿calculate X**2 + Yy**2 


;take square root 


Program 20-5 (continued on the following page) 


ee 


516 


7store R 


7procedure £o Calculate SIN, given R and X 
CALC SIN PROC NEAR 
FINIT Ani ealz BOS 
load R onto stack 
Y OAC Onto Stack 
Sl (10) 5 Seay) S INFS NEAR 
¿store SIN 


;procedure to calculate 

CALC COS PROC NEAR 
FINIT jyinitialize 8087 
FLD : Load R onto stack 
FLD load X onto stack 
FDIV EOS — ae 
HSI ; store COS 
RET 

ENDP 


¡procedure to calculate TAN, given X and Y 
CALC TAN PROC NEAR 
BANT T ¿initialize 8087 
; lead X onto stack 
Me ; loaa Y onto Stack 
Soe (OH) 5 Sue (ab) ;TAN = Y/X 
¿store TAN 


;procedure to calculate 
gene COT PROC NEAR 
FINIT j;initialize 8087 
load Tonto Strack 
4 ¡load XK onto stack 
Sir O Sue GLY, ;COT = X/Y 
;store COT 


Program 20-5 (continued from the previous page) 
Trig functions 


Example 20-6 uses trig functions. The instruction FPTAN (partial tangent) calcu- 
lates Y/X = TAN Z, where Z is the angle in radians and must be 0 < Z < PI/4. Z is stored 
in ST(0) prior to execution of FPTAN. After the execution, ST(0) = X and ST(1) = Y. 
Then X and Y are used to calculate the hypotenuse R. After that, it is easy to calculate the 
sine, cosine, tangent, and cotangent. This process is shown in Program 20-5. 

Note in Example 20-6 that in order to calculate sine and cosine we had to use the 
tangent; however, starting with the 80387 coprocessor there are specific instructions such 
as FSIN (sine) and FCOS (cosine) for these purposes. To invoke these instructions one 
must use the .387 directive for the assembler. 


Integer numbers 


Although performance of real numbers in the x87 is very impressive, integer 
operations should not be overlooked. One way to appreciate this performance is to com- 
pare the addition of two multibyte numbers, each 64 bits, on the 8088/86/286 and on the 
8087. Since in the 8086/88/286, AX is only 16 bits wide, it will take a loop of 4 iterations 
to add a 64-bit number plus the overhead of moving the four 16-bit words in for each num- 
ber and moving the result in and out of the CPU. The same addition can be performed by 


EE EEE Ee eee ee nee eo 


CHAPTER 20: THE IEEE FLOATING POINT AND x87 MATH PROCESSORS 317 


the result. 


;This program adds two positive integer numbers and stores 
8087 

PAGE 60,132 

-MODEL SMALL 

SINGS SZ 


AX, @DATA 
MOV DS, AX 
ADD INT 


NEAR 
EINIT ;initialize 8087 

FILD INT1 ;load integer 1 onto stack 
FIADD INT2 ;add the second integer 
;store result in SUM 


OA 57 B7 42500 00 00 00-00 00 00 00 00 00700) 00 WIBE sore) «ce eee 
1065:0010 OC 40 CE 46 00 00 00 00-00 00 00 00 00 00 00 00 CLS Serra A 


r = 42 B7 57 0A Hex 
binary = 0100 0010 1011 0111 0101 0111 0000 1010 

sign: 0 for positive 

biased exp: 1000 0101 normalize: 85 — 7F = 6 

significand: 01101110101011100001010 

sci. binary: 1.01101110101011100001010 E6 = 1011011.10101011100001010 
decimal: 91.6702481689454 


area = 46 CE 40 0C 
binary = 0100 0110 1100 1110 0100 0000 0000 1100 

sign: 0 for positive 

biased exp: 1000 1101 normalize: 8D — 7F = 14 

significand: 10011100100000000001100 

sci. binary: 1.10011100100000000001100 E 14= 110011100100000.000001100 
decimal: 26400.0232375 


Program 20-6 


the x87 with only 4 instructions. In the x87, integer number instructions are distinguished 
from real number instructions by the letter "I". For example, the instruction FILD loads 
an integer number into ST(0) while the FLD would do the same thing for real numbers, 
One important point about differences between real and integer negative numbers is that 
integer negative numbers are stored in 2's complement. In real negative numbers, the only 
difference between a number and its negative is the sign bit. These numbers are not stored 
in 2's complement. Program 20-6 will show the assembler representation of a negative 
integer. 


eee 
518 


Review Questions 


1. In Assembly language programming, which data directive is used for single-preci- 


son data? 

2. In Assembly language programming, which data directive is used for double-preci- 
son data? 

3. State the number of general-purpose registers in the x87. 

4. x87 registers are accessed according to (LIFO, FIFO). 


5. True or false. In the 8086, "AX" always refers to the same physical register, but in 
the x87 "ST(2)" could be assigned to different physical registers at different times. 
6. The top of stack is referred to as 


7. ST(1) is the register (above, below) ST(0). 

8. What does "FADD ST(4)" do? What are the operands, and where is the result 
kept? 

9. What is the purpose of the instruction "FINIT"? 

10. Instructions using integer data have the letter as part of their mnemonics. 


11. What is the difference between the ADD and FADD instructions? 
12. True or false. The 8087 has an instruction named "FSIN" to calculate sine. 


SECTION 20.3: x87 INSTRUCTIONS 


This section provides an overview of the x87 instructions. 
Real transfers 
HED sre ;pushes source operand onto ST(0) 


;source may be ST(i) or memory 
AO: nD) SHO) duplacates Stack top 


FST dest ;copies ST(0) to destination 
;dest may be ST(i), short or long real variable 
FSTP dest copies ST(0) to dest then pops ST(0) 


;dest may be ST(i), short, long or temporary 
;real memory 

paias "“FSTP Siwy is equivaleme to pose 
;the stack with no data transfer 


FXCH dest ;swaps contents of ST(0Q) and destination 
;FXCH with no operands swaps ST(0) and ST(1) 
;Tip: frequently used to move a register to 
;the top before 
jusing an instruction which assumes ST(0) 


Integer transfers 


HD! SEC ;converts source to temporary real and pushes 
Oncom STON 


HIS clesie ;rounds ST(0Q) to integer and copies to 
; destination 
;dest may be a word or short integer 


FISTP dest ;functions the same as FIST but then pops 
7ST (0) 
;dest may be any binary integer data type 


TE Leer reer een nN Eee 


CHAPTER 20: THE IEEE FLOATING POINT AND x87 MATH PROCESSORS 519 


Packed decimal transfers 


BBEDESEC ¿converts source contents to temporary real 
;then pushes onto ST(0) 


FBSTP dest converts ST(0) to BCDFand SCOreESTAr 
;destination, then pops stack 


Addition 


FADD dest,src;adds src to dest, storing result in dest 
;If no operands are given, ST(0) becomes 
ASON m oui) 
;If one operand is given, destination is 


7;ST(0), Source is operand 
;source may be ST(i) or real data variable 
;dest may be ST(i) $ 


FADDP dest,src;adds sre to dest then pops ST(0) 
;dest may be ST(i), sre is: ST(0) 


FIADD src adds Sro jee) ‘Sut (10) 


Subtraction 


FSUB dest,src;subtracts sre from dest, stores result in 


dest 
;If no operands are given, ST(1) becomes 
STR Se (10) 
;If one operand is given, it will be sre with 
7;ST(0) as the destination 


FSUBP dest,src;subtract src from dest and store in dest 
zsrce is ST(0) felsic ais) See (at) 


IL SWIE hae ;subtract source from ST(0) and store in ST(0) 


Reversed subtraction 


FSUBR dest,src;functions the same as FSUB but subtracts 
;dest from src 
jand stores the result in dest (R is for 
reverse) 


FSUBRP dst,src;functions the same as FSUBP but subtracts 
¿dest from src 
;instead of sre from destination 


FISUBR src ;operates the same as FISUB but subtracts 
7ST(0) from source 
;and stores in ST(0) 


Multiplication 


FMUL dest,src;multiplies dest by sre and stores result in 


; dest 

;If no operands are given, ST(0) becomes SP Ld 
pee Sal CL) 

;If one operand is given, destination is 
7ST(0), source is operand 


eee 
520 


;source may be ST(i) or real data variable 
;dest may be ST(i) 


PMULP dest,sre;multiplies dest by src, stores result in 
;dest and pops src 
dores S(O), dest is sur (Gh) 


FIMUL sre rmultiplies ST(0) by src and stores result in 

Polo) 
Division 

FDIV dest,src;divides dest by src and stores result in 
;dest 
;If no operands are given, ST(0) becomes 
ASKON f/f =o (1) 
;If one operand is given, ST(0) is divided by 
;the src operand 


;source may be ST(i) or real data variable 
pkeisie kiss SUL) 


FDIVP dest,src;divides dest by src and stores result in 
;dest, then pops src 
GSicc als; Sar(O)) ,clesic sey ST) 


TINDIOW SEC ;divides ST(0) by sre and stores result in 
For 


Reversed division 


FDIVR dest,src;functions identical to FDIV except src is 
;divided by dest 


FDIVRP dst,src;functions the same as FDIVP except src is 
;divided by dest 


FIDIVR src ;functions the same as FIDIV except src is 
-divided by dest 


Other arithmetic instructions 


FSQRT ;replaces ST(0) with its square root 


FSCALE ;replaces ST(0) with ST(0) x 2n, where n is 
;the integer in ST(1) " 
;this provides fast method of multiplying by 
jintegral powers of 2 


FPREM ;ST(1) is repeatedly subtracted from ST(0) 
States Sa (Oj << Sie se) 
;same as ST(0) mod ST(1) 


FRNDINT jmamncdseST (0) toran ariteger 
,rounas according, to RE (round control) bits 
MEENE  weontrom word 


FXTRACT ;extracts the exponent from ST(0) and places 
Balin, Sse (iL) 
pextracts the significand from ST(0) and places 
aat Se (((0))) 


M 
CHAPTER 20: THE IEEE FLOATING POINT AND x87 MATH PROCESSORS 521 


FABS ;replaces ST(0) with its absolute value 


FCHS ;reverses sign bit in ST(0) 


Compare instructions 


The following instructions compare ST(0) with the source operand and set condi- 
tion code bits C3, C2, and CO of the status word as follows: 


C3 C2 C0 


0 0 0 ST(0) > source 

0 0 l ST(0) < source 

l 0 0 ST(0) = source 

1 1 1 numbers cannot be compared 

FCOM ;compare ST(0) with real operand 
FICOM ;compare ST(0) with integer operand 


The source operand may be ST(i) or a real number. If no source operand is given, 
ST(0) is assumed. i 


The condition codes are set as follows: 


(C3 C2 Cl CO Meaning 


0 0 0 0 +unnormal 

0 0 0 ] +NAN (not a number) 

0 0 1 0 —unnormal 

0 0 l l -NAN 

0 1 0 0 +normal 

0 l 0 1 +infinite 

0 l 1 0 -normal 

0 | 1 1 —infinite 

1 0 0 0 +0 

1 0 0 1 empty 

1 0 l 0 — 

1 0 l ji empty 

1 1 0 0 +denormal 

1 1 0 l empty 

1 1 l 0 —denormal 

1 1 l 1 empty 

Transcendental instructions 

FPTAN ; computes tangent of theta = y/x 
;theta is in ST(0) and must be between 0 and 
fpi/ 4 
jaiter the ratio is computed, y replaces theca 
akg Sit (@)) 


jand x is pushed onto the stack, becoming the 
-NEw top of stack 


FPATAN ; computes theta = arctan (y/x) 
Be 1s in ST(0 wane yy an STEL) 
7ST(0) is popped, theta is written over y in 
rol(l), ce mean stack com 


F2XM1 7 COMPULES “yy azo 


ee 


7X is taken from ST(0) and must be in the 
tange: -1 to #1 
replaces x imn ST(0) 


FYL2X 7 COmputeemeze= y times log, X 
jets telken tremesno) and y from ST(1) 
7X must be greater than 0 
;z replaces y, which becomes the new stack top 
7as X 1s popped off 


PYL2XPL PpeolpuLes 2 — y times log, (x+1) 
7x 2s tremestT(O) and is in the range: 
70 <(ee] <(1 = xl/2 / 2) 
Gay Sls itic@yin {Sar (IL) 
;z replaces y, which becomes the new stack top 
7as X is popped off 


The following instructions are available only in 387 and later coprocessors. 


FSIN 7;compuces Sin of ST(0) but provides x = S20) 
rand y = ST(1) 
7to get the sin of ST(0), perform y/x using 
;FDIV or to get the cos of ST(0), perform x/y 
;using FDIVR 


ECOS ;same as FSIN, takes cos of ST(0) and places 
poe NSRO and we SAA) 
;torget cos of “ST (0), perform y/x using Ferry 
7Or Bo get sin offs), pertorm x/y using 
EDIMI 


FSINCOS ;computes both sin and cos of STKO) 
places Sin amn SA Omand Cos in (Sy) (CL); 


Constant instructions 


FLDZ ; pushes +0.0 onto the stack 
FLD1 ;pushes +1.0 onto the stack 
FLDPL ;pushes pi onto the stack 
TILDAN 7pusmes log, 10 onto the stack 
FLDLZE spushes log, e onto the stack 
FLDLG2 pousmes Hogy 2 onvo Ehe Stack 
FLDLN2 ;pushes log, 2 onto the stack 


Many mathematical equations can be implemented using constant and transcen- 
dental functions. For example, w = 2vloe . If z = y log, x, FYL2X can be used to calculate 
z. Then F2XM1 can be used to calculate 27 — 1. Then 1 can be added to this to get 2z, 
which is equal to x”. 


The instruction sequence would be 


FLD X 
FYL2X 
F2XM1 
FLD1 


p EE EO 


CHAPTER 20: THE IEEE FLOATING POINT AND x87 MATH PROCESSORS 523 


FADD 

FST SUM 

Other frequently used functions can likewise be calculated, for example e” and 
10”, substituting e or 10 for x in the above equations. In addition, a little creativity will 
allow a programmer to use the constant and transcendental functions frequently. For 
example, if the calculation log, x is needed, the FYL2X (y log, x) function can be used by 
making y = 1. Likewise, if 2* is needed, the F2XM1 function can be used, after which 1 
can be added to the result. 


Processor control instructions 


Many of the following instructions have two mnemonics; the second one has an 
extra N. This N instructs the CPU not to prefix the instruction with a wait state. The no- 
wait forms should be used when CPU interrupts are disabled and the 8087 might generate 
an interrupt, which would create an endless wait. Wait forms are used when the CPU 
interrupts are enabled. 


LP IONIC ole ;resets the processor 

FNINIT 

IIDILS IL eye ;sets the interrupt enable mask in the control 
;word 

FNDISI ;thereby disabling interrupts in the 8087 

FENI or ;clears the interrupt enable mask in the 
;control word 

FNENI ;thereby enabling interrupts in the 8087 


FLDCW src ¿replaces the control word with the contents 
ROIE Sige 


FSTCW dest or;writes the control word to dest 
FNSTCW 


FSTSW dest or;writes status word to dest 


FNSTSW 
FCLEX or ;clears exception flags, busy flag 
FNCLEX zand interrupt request flag in the status word 


FSAVE dest or;writes to dest the 94-byte save area, which 
FNSAVE ;includes the environment and the stack 


FRSTOR src ;restores the 94-byte save area from src 


FSTENV dest or;stores environment (control, status 
FNSTENV ;and tag words and exception pointers) to dest 


FLDENV src ;restores environment previously saved with 
FIGSMEBIN) aLiolene reuKenesLOa 


FINCSTP ;increments status word's stack pointer 
INDIaC Sanz ;decrements status word's stack pointer 


FFREE dest ;marks dest as an empty register 


eee 
524 


FNOP eon ERSTEN to ST(0), 


therefore performs no 
;operation 


FWAIT ;same as CPU's wait instruction 


used to synchronize CPU and 8087 


15 12 7 
| ic] Re | Pc Tim] _[PM]UM]OM]ZM[DM[iM] 


Exception masks 

1 = exception masked 
invalid operation 
denomalized operand 
zero divide 


overflow 


underflow 
precision 


reserved 
interrupt-enable mask 
precision control 
rounding control 
infinity control 


reserved 


Interrupt-enable mask 
0 = interrupts enabled 
1 = interrupts disabled (masked) 


Rounding control 
00 = round to nearest or even 


01 = round down 


10 = round up 


Precision control 11 = chop (truncate) 


00 = 24 bits 

Oils resemed Infinity control 
10 = 53 bits 0 = projective 
11 = 64 bits 


1 = affine 


15 7 
Peo st [C2/C1|C0| IR] _[PE|UEIOE|ZE|DE] 1E. 


Exception flags 
1 = exception occurred 


———_——— invalid operation 
denomalized operand 
zero divide 


overflow 
underflow 


precision 
reserved 


interrupt request 
condition code 


stack top pointer 


busy 
ST values 


000 = register 0 is stack top 
001 = register 1 is stack top 


111 = register 7 is stack top 


Figure 20-3. 8087 Control and Status Word 


SEE 


CHAPTER 20: THE IEEE FLOATING POINT AND x87 MATH PROCESSORS 525 


PROBLEMS 


SECTION 20.1: MATH COPROCESSOR AND IEEE FLOATING-POINT STAN- 
DARDS 


1. What is the disadvantage of using a general-purpose processor to perform math 


operations? 
The IEEE single-precision standard uses bytes to represent a real number. 
The IEEE double-precision standard uses bytes to represent a real number. 


Show the bit assignment of the IEEE single-precision standard. 

Convert (by hand calculation) each of the following real numbers to IEEE single- 

precision standard. 

WSS (b) 89.125 (c) -1022.543 (d) —0.00075 

6. Use the last four digits of your ID number and put the decimal point in the middle. 
Convert it to single-precision IEEE standard (e.g., 9823 is 98.23). 

7. What data types are called short real and long real in Intel's literature? 

8. Show the bit assignment of the IEEE double-precision standard. 

9. In single-precision FP (floating point), the biased exponent is calculated by adding 


A ean 


to the portion of a scientific binary number. 
10. In double-precision FP, the biased exponent is calculated by adding to 
the portion of a scientific binary number. 
11. Convert the following to double-precision FP. 
(a) 12.9823 (b) 98.76123 


12. How many bits are set aside for the magnitude portion of Intel's long integer? 
13. Which bits of packed decimal are used for the sign? 

14. Packed decimal uses only bits of an 80-bit operand. 

15. In Intel's temporary real, the data type is bytes wide. 


SECTION 20.2: x87 INSTRUCTIONS AND PROGRAMMING 


16. Indicate the data directive used for the following data types. 
(a) single-precision FP 
(b) double-precision FP 
(c) packed decimal 
17. Using the assembler of your choice, verify your calculation of Problems 5 and 11. 
18. Write and run an x87 program to calculate z = (x2 + y3)!, where x = 3.12 and y = 
5.43. 
19. Write and run an x87 program to calculate y = 2x2 + 5x + 12.34, where x = 1.25. 
20. Write and run an x87 program to calculate the area of a circle if r = 25.5. 
21. Write and run an x87 program to calculate 3(773)/4 if r = 25.5. 
22. Write and run an x87 program to calculate the sine of a 45-degree angle. 
23. Which of the following processors have an on-chip coprocessor? 
(a) 80286 
(b) 80386 
(c) 80486 
(d) Pentium 
24. For the x87 to calculate a trig function, the angle must be in 
(degrees, radians). 
25. Write the 80387 program to calculate the sin and cos of a 30-degree angle. Run and 
analyze your program. 


eee 
526 


ANSWERS TO REVIEW QUESTIONS 


SECTION 20.1: MATH COPROCESSOR AND IEEE FLOATING-POINT STANDARDS 


1. True 
2. True 
Sa, Se 

4. 64 

5. 7FH 
6. 3FFH 


SECTION 20.2: x87 INSTRUCTIONS AND PROGRAMMING 


DD (define double word) 

DQ (define quad word) 

8 

LIFO 

True 

ST(0) 

Below 

It means ST(4) + ST(O) and the result is placed in ST(0). 

To initialize the registers to the top of the stack 

I 

The assembler generates the opcode for the ADD instruction to be used by the x86 
while it produces the opcode for the FADD instruction to be used by the x87. 
False. 387 and later coprocessors have the FSIN instruction. 


mere ONDARRUN 


— 


Nee ee eee e eee ee EEE 
CHAPTER 20: THE IEEE FLOATING POINT AND x87 MATH PROCESSORS 527 


528 


CHAPTER 21 


386 MICROPROCESSOR: REAL 


vs. PROTECTED MODE 


OBJECTIVES 


Upon completion of this chapter, you will be able to: 


>> 
>> 


>> 


>> 


>> 
>> 


>> 


>> 


>> 


List the additional features implemented on the 80186, 80286, and 80386 
State the purpose of designing two modes, real and protected, into 
the 80386 

Code Assembly language instructions using the new scaled index 
addressing mode of the 386 

Code Assembly language instructions using the new instructions @ 

the 386 

State the purpose of each pin of the 80386 microprocessor 

Describe the data misalignment problem with 386 programs and discuss 
how to resolve the problem 

Describe how protection of user and system programs is accomplished in 
the 386 in protected mode 

Contrast and compare the two methods of virtual memory 
implementation: paging and segmentation 

Describe the methods of converting from logical to physical addresses in 
386 protected mode 


529 


This chapter emphasizes unique features of the 80386 microprocessor, from both 
hardware and software perspectives. In Section 21.1 we look at the 386 in real mode. The 
hardware of the 386 is examined in Section 21.2. Section 21.3 provides an introduction to 
protected mode of the 386. 


SECTION 21.1: 80386 INREAL MODE 


In this section we look first at Intel's 80186 microprocessor and then unique fea- 
tures of the 286 and 386 from the perspective of real mode programming. 


Evolution of x86 from 80186 to 80386 


Intel has a very successful product called the 80186 (and 80188). This chip is 
alive and doing very well in the embedded controller market, where it is used to replace 
multiple devices with a single component. Prior to the introduction of the 80186/88, Intel 
did a survey and found that many are using the 8088/86 along with other peripheral chips, 
such as the 8237 DMA controller, the 8254 timer, and the 8259 interrupt controller. This 
led to putting a portion of these chips along with the 8088/86 microprocessor on a single 
chip and calling it the 80186/88. Internally, the 80186 and 80188 are identical, but exter- 
nally the 80186 has a 16-bit data bus and the 80188 has an 8-bit external data bus. In this 
regard they are similar to the 8086 and 8088. The address bus is still 20-bit, making a 
l-megabyte memory system. The data bus is multiplexed with the address bus. The 
80186/88 is a 68-pin chip that includes the following on-chip functions: (1) clock gener- 
ator, (2) two 20-bit DMA channels, (3) three 16-bit programmable counters, (4) interrupt 
controller, (5) programmable wait-state generator, and (6) programmable chip select 
decoder unit. Although very few 80186/88 microprocessors are used in PCs, millions of 
them are found in embedded systems. The 80186/88 microprocessor supports all 8088/86 
instructions in addition to some new ones. The new instructions of the 80186/88 are as fol- 
lows: 


BOUND dest, source 

ENTER disp, level 

LEAVE 

IMUL result,source,immediate data 
INS dest, port 

OUTS port,dest 

SAR dest,immediate count 
SHR 

SAL 

RCR 

ROR 

RCL 

ROL 

PUSH immediate data 

PUSHA 

POPA 


Some of the above instructions, such as ENTER and LEAVE, are intended for 
implementation in high-level languages but many others can be used in everyday 
Assembly language programs. For example, look at the shift and rotate instructions. In 
the 8088/86 to shift or rotate an operand more than once required putting the count in CE: 
immediate operands could not be used. Starting with the 80186/88, immediate counts are 
allowed. Look at Example 21-1. 

Other useful new instructions of the 80186/88 are PUSHA (push all) and POPA 
(pop all). Very often in writing a procedure (subroutine) all the registers need to be saved 
on the stack. In the 8088/86 one must code PUSH and POP for each 16-bit register sep- 
arately; however, in the 80186/88 the use of PUSHA and POPA can save a lot of coding. 


eee 
530 


Show Assembly language code to shift operand right 26H by 5 bits in each of the following systems. 
(a) 8088/86 (b) 80186/88 


Solution: 
(a) 8088/86 (b) 80186/88 


MOV AL,26H MOV AL,26H 
MOV CL,S5 SHR ALs5 
SHR AL,CL 


Example 21-2 
Show a sequence of x86/88 instructions equivalent to 80186/88 PUSHA and POPA. 


Solution: 
8088/86 80186/88 


PUSH PUSHA 
PUSH 
PUSH 
PUSH 
PUSH 
PUSH 
PUSH 
PUSH 


POP 

POP 

POP 

POP 

POP 

POP 

POP 

POP POPA 
RET RET 


Note that POPA restores all registers except SP, which is ignored. In other words, it does not 
disturb the present stack frame. 


See Example 21-2. 

One can test these new instructions on 286 PCs and later machines. However, 
using these instructions means that the program will not run on 8088/86 machines. There 
are two dominant trends in software for x86-based systems. 


1. Software that runs on any x86 machine, including 8088/86-based systems. 
2. Software that is 32-bit 386 based and must be run on 386 and higher machines. 


Also note that the DEBUG utility does not support the new 80186/88 instructions 
since it is intended to run on any x86 PC, including the 8088/86. 


nena e ene SVK 
CHAPTER 21: 386 MICROPROCESSOR: REAL VS. PROTECTED MODE 531 


80286 microprocessor 


The demand for a more powerful CPU led Intel Corporation to use more than 
100,000 transistors to design a new microprocessor called the 80286. This processor is 
downwardly compatible with the 80186/88 processor. The 80286 microprocessor was a 
major improvement over the core 8086 in the following ways. 


1. There are separate pins for the address and data buses and thus no need for demul- 
tiplexing the buses as was the case in the 8088/86 microprocessor. This increased 
number of pins required abandoning DIP (dual in-line packaging). Instead, PGA 
(pin grid array) packaging was chosen. 

2. The memory cycle time was reduced to 2 clocks from 4 clocks in the 8088/86. 
This made memory interfacing quite a challenge, especially for frequencies of 20 
MHz and beyond. Memory design of high-performance computers is discussed in 
Chapter 22. 

3. Introduction of virtual memory in the 80286 was the most drastic change over the` 
8088/86. The 80286 works in two different modes: real mode and protected mode. 
In real mode, the 80286 is simply a faster 8086, capable of handling only 1M byte 
of memory. It executes all the instructions of the 8086 with fewer clock cycles, as 
shown in Appendix E. In order to use the entire 16 megabytes of memory space, 
the 80286 must work in protected mode. When the 80286 microprocessor is turned 
on, it automatically starts from real mode and can be switched to protected mode. 
When in protected mode, all the address buses AO—A23 can be used, thereby giving 
a total of 16M bytes of addressable physical memory (RAM and ROM). It is in this 
mode that most of the changes over the 8086 have been introduced into the 80286. 
Due to the declining price of the 80386, very few systems use the 286 in protected 
mode; therefore, we bypass any discussion of the 80286 in protected mode. 
However, since protected mode of the 80286 is a subset of the 80386, many of the 
concepts of 386 protected mode apply to the 286. 


Major changes in the 80386 


The 80386 microprocessor started a new trend in the x86 family. Although it is 
downwardly compatible with the 8088/86 and 80286, there are some major changes in its 


architecture. The following are some of the major changes that have been introduced in 
the 80386. 


1. The data bus was increased from 16 bits to 32 bits, both internally and externally. 
All the registers were extended to 32 bits, thereby making the 80386 a 32-bit 
microprocessor. 

3. The address bus was increased to 32 bits, thus providing 4 gigabytes (232) of phys- 
ical memory addressing capability. 

4. The paging virtual memory mechanism was introduced, making the 80386 capable 
of using both segmentation and paging. More about paging and segmentation is 
provided in Section 21.3. 

5. Anew addressing mode called scaled index was added. 

6. Many new bit-manipulation instructions were added. These instructions work in 
both real mode and protected mode. 

7. The 386 can be switched from protected to real mode by software. This is a major 
improvement over the 286, which had to be reset to switch back to real mode. 


To reduce the cost of board design, Intel made the 80386SX microprocessor avail- 
able with a 16-bit external data bus, but internally it remained a 32-bit processor, 100% 
compatible with the 80386. In terms of memory bandwidth, it is slower than the 80386 


ee 
532 


General Data and Address Registers Segment Selector Registers 


31 16 15 0 15 0 
=. CS code 
EBX SS stack 
ECX DS data 
EDX ES data 
ESI FS data 
Emi GS data 
EBP Instruction Pointer and Flags Register 
ESP 31 16 15 0 


EIP 


| FLAGS |EFLAGS 
Figure 21-1. Selected Intel 386 Registers 


since it takes two memory cycles (each 2 clock cycles) to address a 32-bit word instead of 
only 1 memory cycle, as is the case in the 80386. Intel also made the 80386SX with only 
24-bit address buses, the same as the 80286. In other words, the 80386SX is the same as 
the 80286 externally, but internally it is a 32-bit processor fully compatible with all 386 
computers. 

In real mode, the 80386 can access a maximum of | megabyte using address pins 
A19-A0. However, in protected mode the 80386 can access 4 gigabytes of memory 
through using the 32-bit address bus. 


60386 real mode programming 


In the design of the 80386, Inte’ made such massive design changes that it is rad- 
ically different from the 80286, yet it is still capable of running all the code written for the 
286 and 8088/86. Next we describe some of these new features that are available in both 
real and protected modes. 


32-bit registers 


In the 80386, register sizes were extended from 16 bits to 32 bits and register 
names were changed to reflect this. For example, EAX is the extended AX, EBX is the 
extended BX, and so on. See Figure 21-1. In order to access 32-bit registers, the letter E 
must be included in the coding. The four general-purpose registers AX, BX, CX, and DX 
are still accessible in their 8086 formats in addition to the extended format. For example, 
register EAX is accessible as AL, AH, AX, and EAX. Notice that the upper 16-bit part is 
not accessible as a separate register. One way to access it is to shift EAX right. See 
Example 21-3. 


Example 21-3 


Load EAX with 7698E35FH and move it among the 8-, 16-, and 32-bit registers of the 386. 


Solution: 


MOV EAX, /608H Soe ;EAX=7698E35F (AX=E35F,AH=E3,AL=5F) 
MOV EDX, EAX ; EDX=EBAX=7 698E35F 
MOV CH, AL ; CH=AL=5F 
MOV DI,AX ; DIF=FAX=E35F 
BSI, EDX ; ESI=EDX=7 698E35F 
EAX, 16 ¿rotate EAX right 16 times (EAX=E35F7698H) 
BX, AX ; BX=AX=7698H 
CL, AL ; CL=AL=98H 


CHAPTER 21: 386 MICROPROCESSOR: REAL VS. PROTECTED MODE 538 


Which end goes first? 


In storing data, the 386 followed the tradition of the 8086/286 in placing the least 
significant byte (little end of the data) in the low address. As discussed previously, this is 
referred to as little endian. See Example 21-4. 


Example 21-4 


Show how data is placed after execution of the following code. 
MOV EAX, 7698E39FH ; EAX=7698E39F 
MOV [ 4524] , AX 
MOV [ 8000] , EAX 


Solution: 


For "MOV [4524],AX" we have 
DS:4524 =(9F) 
DS:4525 =(E3) 

and for "MOV [8000],EAX" we have 
DS:8000 = (9F) 
DS:8001 = (E3) 
DS:8002 = (98) 
DSESCOS) a (IS), 


In Example 21-4, notice how the least significant byte (the little end of the data) 
9FH goes to the low address 8000, and the most significant byte of the data 76H goes to 
the high address 8003. This means that the little end of the data goes in first, hence the 
name little endian. In the Motorola 68000 family, data is stored the opposite way: The big 
end (most significant byte) goes into the low address first, and for this reason it is called 
big endian. Some recent RISC processors, such as Power PC (developed jointly by IBM 
and Motorola), allow selection of mode, big endian or little endian. The software overhead 
of converting from one camp to the other led Intel to introduce a new instruction called 
SWAP in the 80486, specifically to take care of this problem (see Chapter 23). 


General registers as pointers 


Another major change introduced in the 80386 is the use of general registers such 
as EAX, ECX, and EDX as pointers. As you might recall from Chapter 1, the 8088/86/286 
can use only BX, SI, and DI as pointers into the data segments. But starting with the 386, 
all 32-bit general-purpose registers can be used for pointers into data segments. Look at 
the following cases for valid and invalid instructions. 


MOV BX,WORD PTR [ EAX] ¡move into BX word pointed to by EAX 
MOV BX,WORD PTR [ AX] ;invalid AX can't be used as pointer 
MOV EAX, DWORD PTR [ ECX] ;move into EAX DWORD pointed to by ECX 
MOV AL,BYTE PTR [ EDX] ;move into AL BYTE pointed to by EDX 
MOV EBX,WORD PTR [CX] ;invalid CX can't be used as pointer 
MOV EAX, DWORD PTR [ EDI] ;move into EAX DWORD pointed to by EDI 


The 386 also allows the use of displacement for 32-bit register pointers. 
Therefore, instructions such as "MOV AL,[ECX+100]" are perfectly valid. Of course, the 
386 supports all the addressing modes of the 8086/286 discussed in Chapter 1. 
Table 21-1 shows some of the addressing modes supported by the 386. 


Scaled index addressing mode 


One of the most powerful addressing modes introduced in the 386 is scaled index 
addressing mode. It allows access of multidimensional arrays with ease. In scaled index 


eee 
534 


Table 21-1: Addressing Modes for the 80386 


None 
None 
DS 


Register indirect 


Based relative [BX]+disp 
[BP]+disp 
[EAX]+disp 
[EBX]+disp 
[ECX]+disp 
[EDX]+disp 
[EBP]+disp 
[D!]+disp 
[S!]+disp 
[EDI]+disp 
[ESI]+disp 
[R1][R2]+disp 

R1 and R2 are any of the above 


Indexed relative 


Based indexed relative 


Note: In based indexed relative addressing , disp is optional. 


addressing mode, any of the 32-bit registers, except ESP, can be used as a pointer that is 
multiplied by a scaled factor of 1, 2, 4, or 8. The scaling (multiplication) factors 1, 2, 4, 
and 8 correspond to byte, word, doubleword, and quadword operands, respectively. Look 
at Example 21-5 to see how the effective address is calculated in cases where the scaled 
index addressing mode is used. Only the 32-bit register pointers can be used for this mode. 
They are shown in Table 21-2. 

j Example 21-6 shows how the scaled Table 21-2: 386 Scaled Index 
index addressing mode is used. It must be noted Addressing Mode 
that we cannot use a 16-bit register as a scaled 
index. In other words, the instruction "MOV moas ~ A 
AL,[ESI+BX*4]" is invalid. 

It must be noted that for an Assembly 
language program to be run under DOS, the 
effective address should not exceed FFFFH. In 
other words, if EBX is used as a pointer, you 
must make sure that the upper 16 bits of the 
EBX register are all zero, since DOS works in 
real mode. 


CHAPTER 21: 386 MICROPROCESSOR: REAL VS. PROTECTED MODE 335 


Example 21-5 


Find the effective address in each of the following cases. Assume that ESI = 200H, ECX = 100H, 
EBX = 50H, and EDI = 100H. 

(a) MOV AX,[2000+ESI*4] (b) MOV AX,[5000+ECX*2] 

(c) MOV ECX,[2400+EBX*4] (d) MOV DX,[100+EDI*8] 


Solution: 

(a) EA (effective address) is 2000H + 200H x 4 = 2000 + 800H = 2800H. Therefore, the 
logical address of the operand moved into AX is DS:2800H. 

(b) By the same token we have EA = 5000H + 100H x 2 = 5000H + 200 = 5200H. 

(c) EA = 2400H + 4 x 50H = 2400H + 140H = 2540H. 

(d) 100H + 8 x 100H = 100H + 800H = 900H. 


Example 21-6 


Using the scaled index addressing mode, write an Assembly language program to add 5 operands of 
32-bit size and save the result. ' 


Solution: 


.MODEL SMALL 
SHS 
-STACK 300H 
.DATA 

MYDATA DD 234556H, OF983F5H, 6754AE2H, 0C5231239H, OAF34ACB4H 

RESULT DQ ? 
"CODE 
MOV AX, @DATA 
MOV DS, AX 
SUB EBX, EBX ; EBX=0 
MOV EDX, EBX ;clear EDX 
MOV EAX, EBX ;clear EAX 
MOV EXPS ;set the counter to 5 
ADD EAX,[ MYDATA+EBX* 4] ; add the 32-bit operand 
ADC EDX, 0 ;save the carry 
INC EBX point to next 32-bit data 
DEC CK decrement the counter 
JNZ BACK ; repeat untidy counter is Zero 
MOV DWORD PTR RESULT, EAX ; save the lower 32 bits 
MOV DWORD PTR RESULT+4,EDX ;save the upper 32 bits 
place code here to return to OS 


In this program, we first define the 32-bit data using the DD directive, and the RESULT is defined 
as 64-bit using the DQ directive. Notice that EBX is initially zero; therefore, the instruction "ADD 
EAX,[MYDATA +EBX*4]" adds the first 32-bit operand to EAX since the effective address is 
MYDATA. "INC EBX" makes EBX = 1; therefore, in the next iteration the effective address is 
[MYDATA+1 * 4], and likewise in the next iteration the effective address is [MYDATA+2 * 4], 
which is MYDATA+8, and so on. For example, if the offset address for MYDATA is 2000H, the 
effective address is 2000H for the first iteration, 2004H for the second iteration, 2008H for the third 
iteration, 200CH for the fourth iteration, and so on. 


536 


Some new 386 instructions 


There are many new instructions in the 386 that work in both real and protected 
modes. A detailed look at each new instruction and how it is used is beyond the scope of 
this volume. Here are some of the new instructions with examples. 


MOVSX and MOVZxX instructions 


As we discussed in Chapter 6, the 8086 has sign-extend instructions such as CBW 
(D7 of AL is copied into all the AH bits) and CWD (D15 of the AX is copied into all bits 
of the DX). In the 386, there is a new instruction CDQ (convert doubleword to quadword) 
in which the sign bit of EAX, D31, is copied to all the bits of EDX. Notice that in all sign- 
extend instructions, the accumulator sign is extended. To overcome this limitation, Intel 
introduced the MOVSX and MOVZX< instructions. In the MOVSX, the sign bit of any 
register (or even a memory location) can be extended (copied) into any register. Similarly, 
MOVZX zero-extends the contents of a register or memory location. The MOVSX 
instruction is used to sign-extend the operand in signed number arithmetic to prevent over- 
flow problems. The MOVZX instruction is used in unsigned arithmetic. Look at Example 
21-7. 


Example 21-7 


Find the contents of destination registers after execution of the following code. 


(a) MOV BL,5 (b) MOV DL+9 
MOVSX CX,BL MOVSX EBX,DL 
(c) MOV AL,95H (d) MOV BH,83H 
MOVZX ECX,AL MOVZX AX,BH 


Solution: 


MOVSX copies the source register into the lower bits of the destination register and copies the sign 
bit into all upper bits of the destination register. Therefore, we have the following. 


(a) MOV Blige ;BL=1111 1011B =FBH (2's complement) 
MOVSX CX,BL ;CL=FBH,CH=FF since BL is copied into 
;CL and the sign bit (D7) is copied into 
jall CH bits. BL is unchanged 


(b) DL = 0000 1001B = 09H. Then BL = 09 and D8—D31 of EBX are all zero, the sign bit of DL. 
Therefore, EBX = 00000009. 


(c) MOV AL, 95H ;AL =1001 0101B =95H 
MOVZX ECX,AL Am =e 95H and D8 D3SIMOr T ECX are alll zeros 
;therefore, ECX =00000095H 


(d) BH = 1000 0011B = 83H. Then AL = BH = 83H and D8-D15 of AX are all zeros. 
Therefore, AX = 0083H. 


Bit scan instructions 


The 386 has new instructions allowing a program to scan an operand from LSB - 
to MSB or from MSB to LSB, to find the first high bit (= 1). If the scanning is done from 
the least significant bit (D0) toward higher bits, the BSF (bit scan forward) instruction is 


e 
CHAPTER 21: 386 MICROPROCESSOR: REAL VS. PROTECTED MODE 537 


used. If the scanning is done from the most significant bit (D31) toward the lower bits, the 
BSR (bit scan reverse) instruction is used. In these instructions whenever the first high is 
found, the scanning is stopped and the position of the bit is written into the destination 
register. The bit position is numbered from DO (LSB) to D31 (MSB), regardless of the 
direction of scanning. See Example 21-8. 


Example 21-8 


Find the register contents after the execution of the following code. 
(a) MOV BX, 4578H 
BSF DX,BX ;scan BX and put the position of the first mogn ince Dx 
(b) MOV ECX, 3A9H 
BSR EAX, ECX ;scan ECX from D31 down and put position of 
7 Giest hgh 1InCOmMHAX 


Solution: 


(a) DX = 03 since scanning 4578H = 0100 0101 0111 1000B from right to left yields 1 in D3. 
(b) EAX = 9 since scanning 000003A9H = 0000 0000 0000 0000 0000 0011 1010 1001 
from D31 toward DO yields the first high in D9; therefore, EAX = 9. 


Review Questions 


The 80188/86 is a(n) (8-bit, 16-bit) processor. 

What is the size of the external data bus on the 80186? 

In which x86 was the concept of virtual memory introduced? 

In which x86 was the protected mode concept introduced? 

The 80286 works in which of the following? 

(a) real mode (b) protected mode 

(c) both (a) and (b) (d) 8086 virtual mode 

6. True or false. The 32-bit registers of the 386 can be accessed only in protected 
mode. 

7. Find the contents of BL, BH, BX, and EBX after execution of instruction "MOV 
EBX,99FF77AAH". 

8. The 80386 uses the (little endian, big endian) convention. 

. List all the 32-bit registers that can be used as pointers into the data segment. 

10. In the instruction "MOV EBX,[EAX+ESI*8]", find the effective address if EAX = 
2000 and ESI = 100 (both in hex). 

11. Scaled index addressing mode can be used with which of the following registers? 


ie abo 


(a) SI (b) EDI (c) EAX 
(Dx (e) ECX (£) CX 
12. Find the contents of EDX after execution of the following code. 
MOV DL,-9 


MOVSX EDX,DL 
13. Find the contents of ECX after execution of the following code. 
MOV DL,-5 
MOVZX ECX,DL 
14. Find the contents of DX and AX after execution of the following code. 
MOV BX,1998H 
BSF DX,BX 
BSR AX,BX 


eee 
538 


SECTION 21.2: 80386: A HARDWARE VIEW 


We present a hardware view of the 386 in this section. To avoid confusion, Intel 
calls the 80386 with a 32-bit external data bus the 80386, and 80386SX refers to the 386 
with a 16-bit external data bus. Figures 21-2 and 21-3 provide a block diagram and pin 
layout of the 80386, respectively. Signal functions are provided in Table 21-3. 


Overview of pin functions of the 80386 


D31-D0 (data bus) 


These provide the 32-bit data path to the system board. They are grouped into 8- 
bit data chunks, DO-D7, D8—D15, D16—D23, and D24—D31. Each 8-bit data bus is 
accessed by a separate byte enable pin (BE). 


A31-A2 and BEO, BE1, BE2, BE3 


These provide the 32-bit address path to the system board. Notice the absence of 
AO, Al, or BHE seen in earlier generations of the x86. Since the 80386 supports data types 
of byte (8 bits), word (16 bits), and double word : 5 
(32 bits), the external buses must be able to Table jz = Bus Selection 
access any of the 4 banks of memory connected = 
Data Bus Byte Enable 


to the 32-bit data bus. BEO-BE3 are used to 
Sec. yee eoa [ODO THEO 
bank selection. According to Table 21-4, to 


select D7—D0, BEO is used, BE1 is for D15—D8, 
and so on. See Figure 21-4. 


Address Bus 
2X CLOCK ( pane 


32-BIT Data Bus 
DATA ( DO - D31 


TM 
Intel 386 DX 
Microprocessor 


ADS# 
A 
BUS NA# 


CONTROL BS 16# 
__READY# 


BUS CYCLE DEFINITION 


COPROCESSOR SIGNALING 
ARBITRATION HLDA E 


LD 
BUS k HO 


INTR 


NMI 


INTERRUPTS RESET POWER CONNECTIONS 


Figure 21-2. 80386 Block Diagram (# indicates active low) 


p 
CHAPTER 21: 386 MICROPROCESSOR: REAL VS. PROTECTED MODE 539 


O00 0 @0 008489072 


A15 A16 A17 A20 A21 A23 A27 A30 


ENa 


O O Ò z O O A29 


z . 2 Oo vss 


SRG 

OOO 

VCC VSS VCC 

Q = O METAL LID 

Or Ore 

VCC INTR NC 

Ceres 
ERROR# NM! PEREQ 

O@® 

VSS BUSY#RESET 

OOO 

VCC WIR# LOCK# 


OO 


ss 
OOO000000 


O. O vec VCC BEO# CLK2 VCC DQ VSS 


O50 


BES#BE2# BE1# NA# NC NC READY# D1 VSS 


TO © OEO 


VCC VSS BS16# HOLD ADS# VSS VCC D2 


3 
PIRFRIRRRA FEEPRREE FFFFFFFD EEGEFFEC 


D31 D24 D23 D16 D15 D8 D7 DO 


Figure 21-4. 80386 Banks 
ADS, BS16, NA, and READY 


ADS (address status), BS16 (bus size), NA (next address request), and READY 
are bus control signals. These signals allow the implementation of an efficient bus control 


caaaeaee 
540 


Table 21-4: Intel 386 DX PGA Pinout 


SSS SOE 


A5 B2 | A27 


A10 D2 |aps# E14|D14  P42|iNTR B7 
A11 D1 |BE0# E12| p45 M11lLOoCk# C10|vce M13ļvss M10] 
M2 E3 BE1#  C13| D16 NII Miog A12 


A8 H2 
8 

D24 P7 
hec. eal 


G M 

co B4 

C. B6 

C. B42 

Cc. 106 

A21 K1 D25 fe Ta 
ee 

C F 


A22 K2 |D4 K14 | D26 P5 | Vcc A14 | Vss B14|N.C 
N5 Vss C11 


(Reprinted by permission of Intel Corporation , Copyright Intel Corp . 1992) 


circuitry. For example, using BS16 allows the 80386 to be connected to the 16-bit data bus 
instead of the 32-bit. The use of NA (next address) provides the option of address pipelin- 
ing, where the address of the next memory cycle is provided in the last clock cycle of the 
present memory cycle. 


W/R, D/C, and M/IO 


These signals 
provide the bus cycle 
definitions and the type 
of the bus cycle accord- 
ing to Table 21-5. 


Table 21-5: 80386 Bus Cycle Definition 


MIO Bus Cycle Type 
o | o | o _| Interrupt acknowledge 


1 Halt (shutdown) 
Memory data read 
Memory data write 


(Reprinted by permission of Intel Corporation , Copyright Intel Corp. 1992) 


eee eee eee SK 
CHAPTER 21: 386 MICROPROCESSOR: REAL VS. PROTECTED MODE 541 


Example 21-9 


Indicate which part of the data bus is selected for the following BEs. 
(a) BE3 BE2 BE1 BEO = 0000 (b) BE3 BE2 BE! BEO = 0011 
(c) BE3 BE2 BE] BEO = 1100 (d) BE3 BE2 BE! BEO=1101 


Solution: 


(a) D31—D0, the entire 32-bit data bus 
(b) D31—D16, the upper 16-bit data bus 
(c) D15—D0, the lower 16-bit data bus 
(d) the 8 bits of D1S—D8 


Figure 21-5 shows the above data selection graphically. Note that BE is active low. 


Figure 21-5. Graphical Representation of Example 13-9 (Selected Byte Is Shaded) 
Example 21-10 


A 80386 system is advertised as 33 MHz. What frequency is connected to CLK2? 


Solution: 


CLK2 = 66 MHz because the frequency connected to CLK2 is always twice the system frequency. 
CLK2 


This provides the timing for the 386. The frequency connected to CLK2 is always 
twice the system frequency. For example, a 16-MHz 386 system requires CLK2 to be 32 
MHz. 


RESET 


This is a level-sensitive input signal into the 80386. When a low-to-high signal is 
applied to RESET, the 80386 will suspend all operations and the registers are initialized 
to fixed values. The RESET state of EIP and CS must be noted, along with the state of 
A31-A2 and BEO-BE3, because this has some major implications as far as where the boot 
ROM should be located (see Table 21-6). This means that the microprocessor will fetch 
the first opcode from memory location FFFFFFFO. This is 16 bytes from the 4-gigabyte 
maximum address range of FFFFFFFFH. At this location, there is either a JMP FAR ora 


——— ese 


542 


CALL FAR instruction. Upon executing the JMP or Table 21-6: RESET State 
CALL instruction, the 386 makes A31—A20 all zero, 


thereby forcing it to stay within the 
l-megabyte address range of real mode. This is the F000 


case for all 386, 486, and Intel Pentium chips. All 

these processors wake up in real mode, but the EIP el 
address where the first opcode must be found is 
located in the extended memory space and not in the 


first megabyte address space of real mode. This 
means that for 386 and higher PC systems, there are (Reprinted by permission of Intel 
duplicate ROMs in both the 4-gigabyte and a See No ee 
1-megabyte address spaces, as shown in Figure 21-6. 


00000000 


000F0000 
OOOFFFFF 


BIOS ROM Duplicate 


FFFF0000 
FFFFFFFF 


Figure 21-6. BIOS ROM Duplicate for 386/486/Pentium PC 


The remaining signals of the 386 are similar to those for the 80286, and readers 
can refer to Chapter 10 for their meanings. 


Bus bandwidth in the 386 


With zero wait states, it takes the 386 two clocks to perform the read or write 
cycle. A 2-clock bus cycle is standard in all high-performance microprocessors, including 
RISC processors. This leads to a very high bus bandwidth. The two clocks of the 386 
memory (or I/O) bus cycle time for zero wait states are shown in Figure 21-7. In the case 
of pipelined read/write cycle time, the next address is provided in the last T clock of the 
present cycle, thus providing some extra time for the decoder logic circuitry, path delay, 
and memory access time. Although in pipelined mode the next address is provided in the 


Figure 21-7. 386 Bus Cycle Time (Nonpipelined) 


-n 
CHAPTER 21: 386 MICROPROCESSOR: REAL VS. PROTECTED MODE 543 


last stage of the present cycle, the read and write cycle time still consists of 2 clocks for 
the zero-wait-state system. 


Example 21-11 


Calculate the 386 bus bandwidth of a 33-MHz system with each of the following. 
(a) 0 WS (b) 1 WS 


Solution: 


With the T state of 30 ns (1/33 MHz = 30 ns), we have memory cycle time of 60 ns and 90 ns for 
(a) and (b), respectively. 

(a) The bus bandwidth is (1/60 ns) x 4 = 66.66 megabytes/-econd. 

(b) The bus bandwidth is (1/90 ns) x 4 = 44.44 megabytes/second. 


Data misalignment in the 386 


The case of misaligned data has a major effect on the 38¢ . us performance. If the 
data is aligned, for every memory read cycle the 80386 brings in 4 bytes of data using the 
D31—D0 data bus. Such data alignment is referred to as doubleword alignment. To make 
data doubleword aligned, the least significant digits of the hex addresses must be 0, 4, 8, 
or C (hex). Look at Example 21-12. 


Example 21-12 


Show the data transfer of the following cases and indicate the memory cycle time if the system 
frequency is 25 MHz. Assume that EAX = 4598F31EH and the system is in real mode. 

(a) MOV [2950],EAX 

(b) MOV [299A],EAX 


Solution: 


The system frequency of 25 MHz makes the cycle time 80 ns (1/25 MHz = 40 ns and each memo- 
ry cycle is 2 clocks, giving 80 ns). 


(a) In this instruction, the 4-byte content of EAX is moved to memory location with starting offset 
address of 2950H on the 32-bit data bus of D31—D0. This address is doubleword aligned since the 
least significant digit is 0. Therefore, it takes only one memory cycle or 80 ns to transfer the data. 
(b) In the first memory cycle, locations with addresses of 2998H, 2999H, 299AH, and 299BH are 
accessed, but only 299AH and 299BH are used for storing AL and AH. In the second memory cycle, 
the address offsets of 299CH, 299DH, 299EH, and 299FH are accessed where only 299CH and 
299D are used to store the upper 16 bits of EAX. This means that we have a total of 160 ns. If pos- 
sible, this must be avoided since nonaligned data slows the data access. 


I/O address space in the 386 


The 80386 can access a maximum of 65,536 input ports and 65,536 output ports 
using the IN and OUT instructions. In this regard, the 386 is exactly like the 8088/86/286 
microprocessors. 


Review Questions 


BHE and AO are associated with the (80386SX, 80386DX) processor. 
The 80386SX is (16, 32) bits externally. 

Exactly how many pins are set aside for the address in the 386? 

The BE2 pin is associated with which part of the data bus? 


eee 
544 


a gl Se 


5. An 80386 of 20 MHz requires a crystal frequency of 


6. Give the first physical address location where the 80386 looks for an opcode upon 
RESET. 


7. With the same frequency, the 80386SX has bus bandwidth (twice, 
half) that of the 80386. 
8. Find the memory cycle time for an 80386 of 20 MHz. 


SECTION 21.3: 80386 PROTECTED MODE 


The 80386 protected mode discussion applies equally to 486 and Pentium chips. 
Due to the complexity associated with 80386 protected mode, many long chapters are 
needed for this subject, and for this reason, here we simply provide an overview of the 386 
in protected mode. 


Protection mechanism in the 386 


As discussed in Chapter 1, physical addresses in the 8086 are calculated by shift- 
ing the segment register left and adding it to the offset. This is also the case for the 80286 
and subsequent x86 processors in real mode. In protected mode, however, the physical 
address of blocks of data or code is held by a look-up table and the segment register is no 
longer shifted left to calculate the physical address. Instead, it is used as an index into a 
look-up table in which the physical address of the operand or code is held. 

Another important change introduced in the 80386 is the protection mechanism. 
The lack of protection of the operating system or users' programs is one of the weakness- 
es of 8088/86-based MS-DOS. This weakness is due to the inability of the 8088/86 to 
block general instructions from accessing the core (kernel) of the operating system. In the 
8088/86, any program can go from any code segment to any code segment, so it is easy 
to crash the system. In contrast, the 80386 provides resources to the operating system that 
prevent the user from either accidentally or maliciously taking over the core (kernel) of 
the operating system and forcing the system to crash. Of course, this idea of protection is 
nothing new; it is commonly used in mainframes and minicomputers, where it is often 
referred to as user and supervisor mode. The 386 provides protection by allowing any data 
or code to be assigned a privilege level. The four privilege levels are 0, 1, 2, and 3, where 
the privilege level of 0 is the highest and level 3 is the lowest. While operating systems 
are always assigned the highest privilege level (level 0), the user and applications such as 
word processors are assigned the lowest privilege level (level 3). Since the user is 
assigned the lowest privilege level, any attempt by the user to take over the operating sys- 
tem is blocked. Higher privilege levels can access lower levels, but not the other way 
around. Again, it must be emphasized that the protection mechanism can be used only 
when the 80386 is switched to protected mode. 


Virtual memory 


Another major feature of the 80386 is its ability to access virtual memory. A CPU 
with virtual memory is fooled into thinking that it has access to an unlimited amount of 
physical (DRAM) memory. DRAM primary memory is also called main memory. In this 
scheme, every time the CPU looks for certain information, the operating system will first 
search for it in main DRAM memory and if it is not there, it will bring it into RAM from 
secondary memory (hard disk). What happens if there is no room in RAM? It is the job 
of the operating system to swap data out of RAM and make room for new data. Which 
data will be swapped out depends on how the operating system is designed. Some operat- 
ing systems use the LRU (least recently used) algorithm to swap data in and out of pri- 
mary memory (DRAM). In the LRU method, the operating system keeps account of 
which data has been used the least number of times in a certain period, and when there is 
need for room it will swap out the least recently used data to hard disk to make room for 
the new data.The total amount of RAM on the computer could be only 2G with a hard disk 
capacity of 100G bytes, but the CPU is fooled into thinking that it has access to all 100G 


e 
CHAPTER 21: 386 MICROPROCESSOR: REAL VS. PROTECTED MODE 545 


of memory. Among the operating systems, Microsoft Windows 2000, XP, and Vista, all 
the variations of Unix, Sun Microsystems' Solaris, and Mac OS X use the capability of the 
80386's virtual memory. Since MS-DOS was written for the 8088/86 microprocessors, it 
does not have virtual memory. 

To implement virtual memory, two methods are used: segmentation and paging. 
In segmentation, the size of the data swapped in and out can vary from 1 byte to a few 
megabytes (in 80386, 80486, and Pentium, the upper limit can be as high as 4 gigabytes). 
In paging, the size is a multiple of one page of 4096 (4K) bytes. Paging is used widely 
since it prevents memory fragmentation, where available memory becomes fragmented 
into small sections of varied sizes. When this happens, the operating system must contin- 
uously move files around to make room for the new files, which could be any size. Paging 
makes the job of the operating system much easier since all the files will be a multiple of 
4K bytes. If the size of a file is not a multiple of 4K bytes (which is the case most of the 
time), the operating system will leave the unused portion empty and the next file will be 
placed on a 4K boundary. This is similar to the cluster in floppy and hard disks. As shown 
in Chapter 19, the disk allocates memory to each file in clusters. For example, if 4 sectors 
are used for each cluster, each cluster can store 2048 (4 x 512) bytes per sector. If a given 
file is 12,249 bytes, the operating system will assign a total of 7 clusters or 14,168 (7 x 
2024 = 14,168) bytes. All bytes between 14,168 and 12,249 are unused. This results in 
wasting some memory space on the disk but at the same time makes the design ofthe disk 
controller and operating system much easier. This concept applies as well to the paging 
method of virtual memory as far as the allocation of main memory (DRAM) to data and 
code is concerned. One can briefly define the segmentation and paging virtual memory 
mechanisms in the following statement. In segmentation virtual memory, the file can be 
any byte size, located anywhere it can fit into main memory. In paging virtual memory, 
the file is always a multiple of 4096 bytes and located on a 4K-byte boundary in main 
memory. 

All high-performance RISC microprocessors use paging virtual memory only and 
none use the segmentation method. The reason that 386, 486, and Pentium processors sup- 
port segmentation (in addition to paging) is that they had to stay compatible with the 
8086's 64K-byte segment size. 


Segmentation and descriptor table 


In segmentation virtual memory, the segment registers are used as selectors into 
the descriptor table, where all the information about a given piece of data and code is kept. 
The descriptor table uses 8 bytes of space to provide the following information about a 
given piece of code or data. 


l. 4 bytes for the AO-A31 address, where the code (or data) is located in main memo- 
ry. This allows the 386 to access any memory location within its 4-gigabyte 
address space. Notice in Figure 21-8 that A23—A0 is provided by bytes 2, 3, and 4, 
but A31—A24 is provided by byte 7. 

2. LO-L19: This 20-bit limit is used for checking the segment size and is limited to 1 
megabyte. Notice that bytes 0 and 1 provide LO-L15, and D0-D3 of byte 6 is set 
aside for L16-L19. This provides the scheme whereby the 1 megabyte limit 
imposed on data or code is checked. Since the limit for the segment-oriented 
8086/286 is 64K bytes (216 = 64K), the upper 4 bits must all be zeros. In the 386, 
however, the segment limit can be raised to 4 gigabytes. To do that the G (granular- 
ity) bit is set to high. If G = 0, LO-L19 is used as a number of bytes for the limit, 
but if G = 1, LO-L19 is used as a multiple of 4K for the segment limit. This gives 
220 x 212 = 232 = 4 gigabytes address range, making it possible for the 386 to 
have segments as large as 4 gigabytes. This is quite a relief for software writers of 
database and other application packages since the size of data (e.g., a big array) can 
go as high as 4 gigabytes and is no longer limited to 64K. In the case of the 286 
when the size of the data section of the program was larger than 64K, they had to 


ees 


546 


do lots of software manipulation to overcome this limitation. This also explains the 
origins of the memory models of SMALL, MEDIUM, LARGE, and so on, widely 
used in Assembly and C programs. 

3. The access byte allows protection of a given piece of data or code by assigning the 
privilege levels of 0, 1, 2, and 3 to it, where 0 indicates the highest privilege level 
and 3 is the lowest privilege level. DO—D7 of the access byte are described next. 


po SEGMENT LIMIT 15... .0 
| EGP DI O[AVL| tir 19.16) P| DpL |s| TYPE | a [BASE 23.16 


Base Address of the segment 


The length of the segment 


Present Bit 1 = Present 0 = Not Present 
Descriptor Privilege Level 0-3 


Segment Descriptor 0 = System Descriptor 1 = Code or Data Segment Descriptor 
Type of Segment (3 bits: X, E, R/W) 


Accessed Bit 


Granularity Bit 1 = Segment length is page granular 0 = Segment length is byte granular 


Bit must be zero for compatibility with future processors 


Available field for user or OS 


In a maximum-size segment (i.e., a segment with G=1 and segment limit 19...0 = FFFFFH), the lowe 


12 bits of the segment base should be zero (i.e., segment base 11...000 = 000H). 


Figure 21-8. Descriptor Table Entry 


A (accessed) bit 


If the data or code is accessed (used), A = 1; otherwise, A = 0. This allows the 
operating system to monitor the A bit periodically to see if the CPU is using this piece of 
code or data. If a piece of code or data has not been used recently, the next time the oper- 
ating system needs to make room in main memory for new pieces of code (or data), it can 
move this code (or data) back to the hard disk. The A bit also allows the operating system 
to decide if a given piece of information (code or data) needs to be saved. For example, if 
a piece of data has not been accessed, the operating system can trash it and avoid wasting 
time saving it on the hard disk. On the other hand, if the data was accessed and it was writ- 
ten into, the operating system must save a copy of it on the hard disk before it abandons 
it to create room in main memory for some other data or code. 


EEE cal 
CHAPTER 21: 386 MICROPROCESSOR: REAL VS. PROTECTED MODE 547 


R/W (read/write) bit 


This bit allows code or data to be read protected or write protected. For example, 
the core of the operating system can be write protected, which prevents the user from writ- 
ing into it and crashing the system. In the case of DOS, any program can use the DEBUG 
utility and alter the core of the operating system residing in main memory (DRAM), there- 
by crashing the PC. 

X bit 

This has a different meaning for the data segment and code segment. In the case 

of data, it indicates whether the segment should expand downward as the stack segment 


grows, or upward as the data segment grows. In the case of the code segment, it is used to 
enforce certain rules of privilege level access. 


E bit 
This indicates if the information is executable (E = 1), such as code, or nonexe- 


cutable (E = 0), such as data and stack. This bit also affects the way the X and R/W bits 
are interpreted. 


S bit 


This indicates if the descriptor belongs to the code and data segment (S = 1) or if 
it is a system segment descriptor (S = 0). 


DPL (descriptor privilege level) bits 


This allows one of the combinations, 00, 01, 10, or 11, to be assigned to the code 
or data, indicating the privilege level. 


P (present) bit 


This indicates if the piece of code or data is present in main memory (DRAM). If 
it is present (P = 1), the CPU will process it. If it is not present (P = 0), the CPU causes 
an exception and the exception handler of the operating system will bring the desired 
piece of code or data into main memory from the hard disk. When the operating system 
does so, it sets P = 1 to indicate that the information is now present in main memory. 

The descriptor table is built by the operating system for every piece of code and 
data. The descriptor table register (DTR) inside the 386 holds the physical address of 
where the table is located in the 4-gigabyte address space, which means that the descrip- 
tor table register (DTR) is a 32-bit register. When the CPU changes the contents of a seg- 
ment register (CS, DS, and so on), it uses the segment value as an index into the descrip- 
tor table and pulls into the CPU from the descriptor table all 8 bytes belonging to this seg- 
ment. These 8 bytes are saved in the invisible part of the segment register inside the 386, 


From Figure 21-8 we have the following access byte for code and data. 
P DBPL 1 1 ey a A (access byte for code segment) 
P DPE’! 0 RA E A (access byte for data segment) 


Discuss the following access bytes. 
(a) 10011011 (6) 10010111 (c) 11110001 


Solution: 


(a) This is an access byte for code segment, present, accessed, and privilege level of 00 (highest). 
(b) This is an access byte for data segment, present, accessed, privilege level of 00 (highest), and 
both read and write accessible. 

(c) This is an access byte for data segment, present, accessed, privilege level of 11 (lowest), and 
write protected. 


548 


which means that every segment register inside the 386 has an 8-byte extension, which is 
not visible to the programmer. The pulling of an 8-byte table into the CPU for every 
change of segment register is time consuming but afterward, the CPU has all the informa- 
tion it needs to access a piece of code or data. The addition of two new segment registers, 
FS and GS, in the 386, plus the presence of CS, DS, SS, and ES, helps the CPU always to 
have a total of 6 descriptor table entries available inside the CPU. If code or data is not 
held by one of these 6 descriptor table entries, the CPU must go through the long process 
(it takes 22 clock cycles) of pulling them into the CPU. As we will show later in this sec- 
tion, this problem is solved in the paging method. 

Looking at the 8 bytes of the descriptor table, one might ask why Intel did not 
assign 32-bit physical addresses of desired code or data in consecutive bytes, instead of 
using bytes 2, 3, and 4, and then byte 7. The reason is the 80286 CPU. In the 286 protect- 
ed mode, bytes 2, 3, and 4 are used for the 24-bit address (AO—A23), and bytes 6 and 7 
had to be zero. This led Intel to use byte 7 for the A31—A24 part of the physical address 
of the 386. Byte 6 is used for raising the limit and the G bit, among other things. See 
Figure 21-8. 


Local and global descriptor tables 


There are two types of descriptor tables for the 386: the local descriptor table 
(LDT) and the global descriptor table (GDT). The GDT is used for the system, and indi- 
vidual tasks can have their own LDT. How do the segment registers know which one they 
are accessing? The third bit (TI) of the segment register (referred to as the selector) always 
indicates which table should be used. See Figure 21-9. 


64 terabytes of virtual memory 


As seen in Figure 21-9, the 14 bits of the selector (segment) register can have 
16,384 (2!4) possible combinations. Each possible value can access a descriptor that can 
hold addresses of memory chunks as large as 4 gigabytes. Therefore, we have 214 x 232 = 


SELECTOR 


4 
3) 
2 
‘| 


0 


LOCAL DESCRIPTOR TABLE GLOBAL DESCRIPTOR TABLE 


Figure 21-9. LDT and GDT Selection 


ee eee een 
CHAPTER 21: 386 MICROPROCESSOR: REAL VS. PROTECTED MODE 549 


64 terabytes of virtual memory for the 386 (recall that tera is defined as 24°). To put it 
another way: The 386 can access 64 terabytes of hard disk (virtual memory) as long as the 
virtual memory is broken down into 4-gigabyte pieces, since it has only 32 address pins. 
While the segment limit in the 8086/286 is 64K bytes, the segment limit in the 386 was 
raised to 4G. One of the drawbacks of 386 segmentation is its variable segment size, which 
leads to memory fragmentation. Another is the absence of what is called a dirty bit in the 
access byte of the descriptor table. Assume that there is some memory that can be written 
into. The accessed (A) bit indicates if the data has been accessed but does not indicate if 
any new data was written into it. Why should the operating system care whether the mem- 
ory is altered (written into)? If the data is altered, it is the job of the operating system to 
save it on the disk to make sure that the hard disk always has the latest data. If the dirty 
bit is zero (D = 0), it means that the data has not been altered and the operating system 
can abandon it when it needs room for new data (or code) since the original copy is on the 
hard disk. This will save time for the operating system. If the dirty bit is one (D = 1), the 
operating system must save the data before it is lost or abandoned. Both problems of vari- 
able segment size and lack of a dirty bit in segmentation are fixed in the paging method 
of virtual memory. 


Paging 


Paging of virtual memory was a new addition to the 386, but the segmentation 
method was left over from the 80286. All RISC and Motorola 680x0 processors support 
paging virtual memory. In paging virtual memory, main memory is divided into fixed 4K- 
byte chunks instead of variable sizes of 1 byte to 4 gigabytes, as in segmentation. If a 
given piece of code or data is not present in main memory, the operating system brings it 
into main memory from the hard disk, 4K at a time.This is a much more manageable size 
of memory to transfer than, for example, a 64K-byte segment. Since the size of memory 
is reduced to 4K bytes, the 386 keeps a table for the 32 most recently used pages present 
in main memory to prevent the CPU from swapping data in and out of main memory 
unnecessarily. This table is called the translation lookaside buffer (TLB) and is kept inside 
the 386. To understand the importance of the TLB, let's look at the way paging works. 
First, the term linear address in the 386 must be clarified. The 32-bit address of the 
operand is called the linear address. This linear address can be a direct value such as in 
the instruction "MOV EAX,[50000000]" or may be pointed to by any of registers EDI, 
ESI, EBX, EDX, and so on, as in the instruction, "MOV EAX,[EBX]". This linear address 
must be translated into a physical address to be put on the A31—A0 address pins and sent 
out for the address decoder to find the location in RAM or ROM. In other words, the 
address 50000000H in instruction "MOV EAX,[50000000]" does not refer to an actual 
RAM and ROM address 50000000H. See Figure 21-10. 


Going from a linear address to a physical address 


In paging, the linear address is divided into three parts. The upper 10 bits 
(A31—A22) are used for an entry into what is called a page directory. There is a 32-bit reg- 
ister, CR3, inside the 386 that holds the physical base address of the page directory. Since 
the upper 10 bits of the linear address point to the entry in the page table directory, there 
can be 1024 page directories (210 = 1024). Each entry in the page directory is 4 bytes of 
page table descriptor. Of the 4 bytes of each page table descriptor, the upper 20 bits are 
used to point to another table, where the physical address of the 4K page frame is held. 
How is the correct entry in the table located? A21—A11 (10 bits total) of the linear address 
are used to point to one of the page table entries. Again, each entry in this second table 
has 4 bytes. The upper 20 bits are for A31—A12 of the physical address of where data is 
located. The lower 12 bits of the physical address are the lower 12 bits of the linear 
address. See Figures 21-11 and 21-12. In other words, only the lower 12 bits of the linear 
address match the lower 12 bits of the physical location in RAM or ROM where data is 
located, and the upper 20 bits of the linear address must go through two levels of transla- 
tion tables to get the actual physical address of the beginning page where the data is held. 
This seems like a very long and inefficient process, and it is. This is the reason for the TLB 


—— ees 
550 


TWO LEVEL PAGING SCHEME 


31 22 12 0 
DIRECTORY TABLE OFFSET 
LINEAR USER 


ADDRESS 
12 MEMORY 


10 
TM 
Intel 386 DX CPU 
S ADDRESS 


PAGE TABLE 


CR3 | ROOT 
DIRECTORY 


CONTROL REGISTERS 


Figure 21-10. Paging Mechanism 


PAGE TABLE ADDRESS 31..12 


PAGE FRAME ADDRESS 31..12 


Figure 21-12. Page Table Entry (Points to Page) 
(translation lookaside buffer). The TLB inside the 386 holds the list of the most recently 
(commonly) used physical addresses of the page frames. When the CPU wants to access 
a piece of information (data or code) by providing the linear address, it first compares the 
20-bit upper address with the TLB to see if the table entry for the desired page is already 
inside the CPU. This results in two possibilities: (1) If it matches, it picks the 20-bit phys- 
ical address of the page and combines it with the lower 12 bits of the linear address to 


—Ů 
CHAPTER 21: 386 MICROPROCESSOR: REAL VS. PROTECTED MODE 551 


make a 32-bit physical address to put on the 32 address pins to fetch the data (or code); 
(2) if it does not match, the CPU must fetch into TLB the page table entry from memory. 

Each entry in the page table has 4 bytes. Of these 4 bytes, 20 bits are used to hold 
the A31—A11 physical address of the page frame. The rest are used for the P (present) bit, 
D (dirty) bit, R/W (read/write) bit, A (accessed) bit, and finally, U/S (user/supervisor) bit, 
which indicates the privilege level of given data or code. In the segmentation method 
there were 2 bits for privilege level, giving rise to four levels of protection of 0, 1, 2, and 
3, where level 3 was assigned to the lowest level and level 0 to the highest level. 
However, in the paging method, there is only 1 bit for privilege level, which is called U/S 
(user/supervisor). If U/S = 0, it is user privilege level and is equivalent to level 3 in seg- 
mentation. If U/S = 1, it is supervisor level, belonging to the operating system and system 
kernel. The supervisor privilege level is equivalent to levels 0, 1, and 2 in the segmenta- 
tion method. 


The bigger the TLB, the better 


Since the TLB in the 386 keeps the list of addresses for the 32 most recently used 
pages, it allows the CPU to have access to 128K bytes (32 x 4 = 128) of code and data at 
any time without going through the time-consuming process of converting the linear 
address to a physical address (two-stage table translation). See Figure 21-13. Therefore, 
one way to enhance the processor is to increase the number of pages held by the TLB. This 
is what the Pentium has done, as we will see in Chapter 23. Table 21-7 compares paging 
and segmentation. 


Table 21-7: Paging and Segmentation Comparison 
| Paging | Segmentation — č | 
4K bytes Any size 


4K-byte aligned 
Yes 


Virtual 8086 mode 


A major dilemma for designers of the Intel 386 was how to enhance the 386 and 
still run 8088/86 software based on MS-DOS in protected mode. They solved this dilem- 
ma by adding the virtual 8086 mode to the 386. In virtual 8086 mode, the 386 partitions 
memory into 1-megabyte sections, each assigned to one task. It also runs each task as if it 
is an 8086 program, not concerned with privilege levels. In other words, the 8086 virtual 
mode of the 386 microprocessor allows any program written for DOS to be run unchanged 
under one task, where each task can have its own | megabyte of memory. This means that 
in virtual 8086 mode, the 386 uses the SEG:OFFSET concept used in the 8088/86 micro- 
processor. Microsoft Windows uses the virtual 8086 mode of the 80386 microprocessor. 
These operating systems use the 386's virtual 8086 mode to run multiples of programs 
written for the 8088/86. The difference is that in MS-DOS, only one task can be active at 
a time and all other tasks are sitting idle (dormant) while one task is being run, but in 
Windows each task is given a slice of the CPU's time, and many tasks can be active con- 
currently. For example, a word processor can be used while the modem/FAX is receiving 
and sending data, a spreadsheet program such as MS Excel is doing some calculations, 
and a disk is being formatted. Of course, since there is only one microprocessor taking 
care of all these tasks, it is the job of the multitasking operating system such as Windows 
to slice the CPU time and assign each task time on a circular rotational basis. If there are 


eee 
552 


LINEAR 
ADDRESS 


PHYSICAL 


32 ENTRIES MEMORY 


TRANSLATION 
LOOKASIDE 
BUFFER 


PAGE @ 98% HIT RATE 
DIRECTORY 


Figure 21-13. Translation Lookaside Buffer 


too many tasks and all are active, they all seem to be slow since each task gets less time 
(attention) from the CPU. Of course, one way to solve this slowness is to use high-per- 
formance CPUs with GHz speed. The multitasking operating system can be cooperative 
or preemptive. In cooperative multitasking, two or more applications cooperate with each 
other in taking turns to use the CPU alternately. If one application misbehaves, it can 
cause the whole system to be unstable and crash. In preemptive multitasking, a task can 
be interrupted preemptively at any point by another program. If a task is interrupted by 
another task, its present state will be saved by the operating system and it will be serviced 
after the new task is given a chance to use the CPU. 


Review Questions 


l. 


2 
3 


True or false. In protected mode, the 386 physical address is calculated by shifting 
the segment register value and adding the offset. 

Virtual memory refers to (main DRAM, hard disk) memory. 

How does the operating system decide which code (or data) should be abandoned 
to make room for new code? 

In protected mode (segmentation), where is the physical address of the desired code 
or data located? 

Of the 8 bytes of the descriptor table entry, which one(s) are used for the physical 
address? Assume that they are numbered from 0 to 7. 

When a piece of code is run, which bit of the access byte is modified? 

In 386 segmentation, level 3 is assigned the (lowest, highest) privilege. 
In 386 segmentation, level 0 is assigned the (lowest, highest) privilege. 
How many privilege levels are there in 386 paging? 

True or false. In 386 paging, the linear and physical addresses are the same. 

To get the physical address in 386 paging the linear address must go through __ 


(1, 2) stage(s) of translation. 


The virtual 8086 mode was introduced in the (80286, 80386). 
True or false. In MS-DOS, only one task can be active at a time. 
Why is Windows (Vista, XP) but not DOS a true multitasking operating system? 


m 
CHAPTER 21: 386 MICROPROCESSOR: REAL VS. PROTECTED MODE 553 


PROBLEMS 


SECTION 21.1: 80386 INREAL MODE 


Which microprocessors support the instructions PUSHA and POPA? 
Explain the function of PUSHA. It is equivalent to what set of instructions? 
Explain the function of POPA. It is equivalent to what set of instructions? 
Which microprocessors support "SHL dest,immediate"? 
Find the contents of the destination register for each of the following. 
(a MOV AX,43H 
SHL AX,4 
(bÐ)MOYV BX,8000H 
SHR BX,16 
(0) MOV CX,0AAAH 
ROL CX,8 
(d MOV CX,0AAAH 
ROL CXsl2 
6. True or false. The 80286 was the first x86 to abandon multiplexing of the address 
and data buses. 
7. In which x86 microprocessor was the concept of virtual memory introduced? 
In which of the x86 microprocessors was the 2-clock memory cycle introduced? 
9. Which of the following instructions will cause an error in the 386? 


Ce 2 


= 


(a) MOV EBX,AX (b) MOV ECX,BX 
(c) ADD ECX,EDX (d) ADD EDX,AL 
(e) MOV EBX, SI (f) ADD SLDI 


10. Show how data is stored in "MOV [3500],EBX". Assume that EBX = 9834F543H. 

11. Show how data is stored in "MOV ES:[1000],ECX" (ECX = 07B324H). 

12. Which registers can be used for the scaled index addressing mode? 

13. Write a 386 program to add a factor of 100 to an array of 10 DWORD data. Use 
the scaled index addressing mode. 

14. Write a 386 program to add two multibyte data items of 8-byte size and store the 
result. Use the scaled index addressing mode. 

15. Indicate all the registers that can be used for pointers in the 386. Also give their 
default segments. 

16. Find the destination register contents after execution of each of the following. 


(a) MOV 139,09 WD 
MOVSX EBX,BX 
(b) MOV eles 
MOVSX EDX,CL 
(c) MOV AH,7 
MOVZX ECX,AH 
(d) MOV AX,99H 
MOVZX EBX,AX 


17. Find the contents of EAX and EBX after execution of the following. 
MOV ECX,307F455H 
BSF EAX,ECX 
BSR  EBX,ECX 

18. Find the contents of AX and DX after execution of the following. 


MOV BX,98H 
BSF AX,BX 
BSR DX,BX 


———— eee 
554 


19. What is the purpose of instructions MOVSX and MOVZX? 


20. 


True or false. In the instruction "MOVSX REG,REG", the source and destination 


registers must match in size. 


SECTION 21.2: 80386: A HARDWARE VIEW 


PAE 
22. 
29. 
24. 
25. 
26. 
27. 
28. 
29; 
30. 
38 


a7. 


23 


34. 


BEO-BE3 are active (low, high). 
True or false. The address and data bus in the 386 are multiplexed. 
Which part of the data bus is activated if BEO = 0 and BE1 = 0 (at the same time)? 
Which part of the data bus is activated if BE2 = 0 and BE3 = 0 (at the same time)? 
Which part of the data bus is activated if BEO = 0 and BE3 = 0 (at the same time)? 
Which part of the data bus is activated if BE] = 1 and BE2 = 0 (at the same time)? 
A 25-MHz 386 is connected to CLK2 of MHz. 
Show the status of CS, IP, A31—A2, and BE3—BEO in the 386 upon RESET. 
What are the implications of your answer to Problem 28? 
For what addresses in the 386 PC is BIOS ROM duplicated, and why? 
Draw the bus cycle for the nonpipelined 386. Show the address, data, and READY 
signals. 
Find the total bus cycle necessary to transfer the operand in the instruction "MOV 
[2002],ECX". 
For aligned data, the addresses for DWORD type data in the 386 must have 

as the lower hex digit. 
Find the memory cycle time for a 33-MHz 386. 


SECTION 21.3: 80386 PROTECTED MODE 


35: 
36. 


S7. 


38. 


What is virtual memory? 

True or false. The CPU requests data from virtual memory before it requests data 
from main memory. 

While main memory is made of (DRAM, hard disk), virtual memory 
is (DRAM, hard disk). 

What is the difference between the real and protected modes of the 386 in terms of 
memory space? 

To access the entire 4 gigabytes of the 386, the CPU must be in mode. 


. True or false. The 286 supports both segmentation and paging virtual memory. 
. True or false. The 386 supports both segmentation and paging virtual memory. 
_ True or false. In the 286, the segment size can be 1 byte to 16 megabytes. 

. True or false. In the 386, the segment size can be 1 byte to 4 gigabytes. 


For the 386, what is the page size in paging virtual memory? 


. How many bytes does each entry in the descriptor table use? 
_ State the difference between real mode and protected mode as far as the physical 


address of the operand is concerned. 


. How many bits are set aside for the addresses in the descriptor table, and where are 


they located in the descriptor table? 


_ To make the descriptor table of a 386 286-compatible, we must make bytes 7 and 8 


all (Os, 1s). 


_ How many bits are set aside for the segment limits in the descriptor table, and 


where are they located in the descriptor table? 
True or false. Every piece of data or code accessed by the 386 in protected mode 
must have an access byte. 


Ne ee nn SEE 
CHAPTER 21: 386 MICROPROCESSOR: REAL VS. PROTECTED MODE 555 


oe 


a2: 
53. 
54. 


55 
56) 
a7 


58. 


591 
60. 


61. 
62. 


00 is the (lowest, highest) privilege level and 11 is the (low- 
est, highest) one. 

What is the function of bit A in the access byte of a 386 descriptor table entry? 
What is the function of bit P in the access byte of a 386 descriptor table entry? 
State the characteristic of each of the following access bytes. State for each 
whether it is for code or data. 

(a) 10010001 (b) 11110001 (c)11110011 

(d) 11111011 (e) 10011011 (f) 11111011 

What does TLB stand for, and what is it used for? 

In the 386, state the difference between the linear and the physical address. 

To get the address of code or data in paging, the 386 converts from 

(linear address, physical address) to (linear 
address, physical address). 

In the 386, before the address of the data or code is fetched it is checked against 
the values held by the i 
What is the number of entries in the 386 TEB? 

True or false. In virtual 8086, the addresses are calculated by shifting the — 
register left and adding it to the offset. 

State the differences between paging and segmentation virtual memory. 

How many privilege levels are there in paging? 


ANSWERS TO REVIEW QUESTIONS 


SECTION 21.1: 80386 IN REAL MODE 


9° SO Vee eee 


16-bit 

16-bit 

80286 

80286 

c 

False 

BL = AA, BH = 77, BX = 77AA, EBX = 99FF77AA 
Little endian 

EAX, EBX, ECX, EDX, ESI, and EDI 

EA = 2000 + 8 x 100 = 2800H 


a l G E Only 


EDX = FFFFFFF7H 


. ECX = 000000FBH 
. DX =3, AX = 000C 


SECTION 21.2: 80386: A HARDWARE VIEW 


P ER rat ee 


80386SX 2.16 

34 pins since A31—A2 is 30 and 4 pins for the BEO, BE1, BE2, and BE3 
D23-D16 

40 MHz 

FFFFFFFOH 

Half 

1/20 MHz = 50 ns; therefore, it is 100 ns. 


—_—_— ese 


SECTION 21.3: 80386 PROTECTED MODE 


1. False 

2. Hard disk 

3. According to rule of least recently used 
4. The descriptor table 

5. Bytes 2,3,4, and 7 

6. A (access) bit 

7. Lowest 

8. Highest 

9. Two: user and supervisor 
10. False 

11. 2 stages 

12. 80386 

13. True 


14. Because in Windows more than one task can be active, but in DOS only one task is 
active for a given period 


Se 
CHAPTER 21: 386 MICROPROCESSOR: REAL VS. PROTECTED MODE S57 


558 


CHAPTER 22 


HIGH-SPEED MEMORY 
DESIGN AND CACHE 


OBJECTIVES 
Upon completion of this chapter, you will be able to: 


>> Explain how the introduction of wait states is implemented in the IBM PC 
to coordinate the memory cycle times of x86 CPUs and high-speed 
memory 
Define terms used in memory design, such as memory cycle time and 
memory access time 
Describe the various types of DRAM: standard mode, page mode, 
and static column mode 
Describe how the interleaving method is implemented to solve the 
problem of back-to-back DRAM access and the required precharge time 
Discuss the advantages of using DRAM for main memory and SRAM for 
cache 
Diagram the three types of cache organization: fully associative, direct 
mapped, and set associative 
Explain the write-back and write-through methods of updating main 
memory as cache data is altered 
Describe the cache replacement policies LRU and FIFO 
Contrast and compare EDO and FPM DRAM 
Describe the operation and purpose of SDRAM 
Explain the components and function of Rambus technology 


559 


The potential power of high-performance microprocessors can be exploited only 
if memory is fast enough to respond to the microprocessor's need to fetch code and data. 
There is no use in choosing a fast processor and then interfacing it with slow memory. In 
this chapter we deal with issues of high-speed memory design. In Section 22.1 we look at 
read and write cycle times of the x86 family. In Section 22.2 we discuss various types of 
DRAMs, such as page mode and static column mode. In Section 22.3, the cache memo- 
ry option is discussed, and the way 386, 486, and Pentium processors use cache memory 
to increase system throughput is examined. Section 22.4 examines the newer and faster 
DRAMS of EDO, SDRAM, and Rambus technologies. 


SECTION 22.1: MEMORY CYCLE TIME OF THE x86 


When interfacing a microprocessor to memory, the first issue is how much time 
is provided by the CPU for one complete read or write cycle. In other words, what is the 
memory cycle time of the CPU? In the 8088/86 microprocessor, the memory cycle time 
consists of 4 clocks, which leaves plenty of time to access memory. The slowest 8088/86, 
with a working frequency of 5 MHz, has an 800-ns memory cycle (4 x 200 ns = 800, T = 
1/5 MHz = 200 ns), and the fastest 8088/86, with 10-MHz speed, will have a 400-ns 
memory cycle. A memory cycle of 400 ns means that the CPU can access memory every 
400 ns, and not faster. This is enough time to access even the slow and inexpensive 
DRAMs. However, for the 286, 386, 486, and Pentium, memory cycle time consists of 
only two T clocks. This makes memory design a challenging task, especially when the 
speed of the CPU goes beyond 20 MHz. Table 22-1 shows the memory cycle times for 
various speeds of x86 microprocessors. From Table 22-1 it can be seen that as the frequen- 
cy of the CPU is increased, the maximum amount of time allowed to access memory is 
decreased, forcing the designer either to use fast and expensive memory or to introduce 
wait states into the memory cycle. 


Introducing wait states into the memory cycle 


When the memory timing requirement of the CPU cannot be met, one option that 
designers have is to introduce wait states. All x86 microprocessors have the READY pin. 
When the microprocessor initiates the memory cycle, meaning that it puts the addresses 
on the address bus, the time at which it must have the data at the pins of the data bus is 
fixed and is shown in Table 22-1. This fixed amount of time can be extended by activat- 
ing the READY pin. Every time that the READY pin is activated, the CPU adds one extra 
clock to the memory cycle. For example, the 25-MHz 80386 has a memory cycle of 80 
ns (2 x 40 = 80) with zero wait states. If READY is activated only once during the mem- 
ory cycle, it adds one clock of 40 ns to the memory cycle, thereby giving the memory and 
decoding circuitry a total of 120 ns to get the information to the data pins of the CPU. This 
120 ns is spent on the following parameters: (1) memory decoding logic circuitry and 
address bus buffers (chipset), (2) access time of memory, and (3) the time it takes for sig- 
nals to travel from memory data pins to the data pins of the CPU, going through any logic 
gates on the pathway, such as FPGA chipset. Of these three parameters, memory access 
time is normally the longest. For more about logic families, see Chapter 26. If the allocat- 
ed memory cycle time is not enough, more wait states are needed, making the memory 
cycle time longer. 

In a 25-MHz 386 with 1 wait state, there is a 120-ns memory cycle time, mean- 
ing that the CPU can perform read and write operations no faster than every 120 ns. What 
happens if 140 ns is needed? Since the wait state is an integer multiple of the clock cycle 
(1, 2, 3, and so on), there is no other choice but to have 2 wait states. In other words, there 
is no such thing as 1.5 wait states. Wait states degrade computer performance, as shown 
in Example 22-1. It does not make sense to buy a high-frequency CPU, then interface it 
with slow memory. The next section will look at possible solutions to this problem. 


—— ees 
560 


Table 22-1: Memory Ne Times for x86 


oe 
| sose O| 


80286 10 
80286 | 
80386DX Ba a a 125 


|sossepx | 2 | 
Bn a 
b | = l a i 
2clocks" | _soaseox |  »ł | æ | 
D se | 
| eo4ssepox | 40 25 
| soassox | 50 
2 clocks* [| Pentium | 60 


* From external DRAM or the secondary cache . 
Note: All memory cycle times are with zero wait states . 


A 
(=) 


2 


S) 


© 
N 
© 


— | a 

D 

Oo 

Go o1 
Q (2) 


6.6 13.2 


Find the effective memory performance of a 25-MHz 386 CPU with one wait state. 


Solution: 


Since the 0 WS memory cycle is 80 ns (1/25 MHz = 40 and 2 x 40 = 80 ns), for 1 WS we have a 
memory cycle time of 120 ns. That means that the memory performance is the same as that of a 
16.6-MHz 80386 (120 ns/2 = 60 ns, then 1/60 ns = 16.66 MHz) as far as memory accessing is con- 
cerned. This is 67% performance of the 80386 with zero wait states. 


Review Questions 


1. Find the read/write cycle time of the following systems 
(a) 40-MHz 386 with 0 WS (b) 50-MHz 486 with 1 WS 
(c) 66-MHz Pentium with | WS 

2. A given CPU has a read/write cycle time of 50 ns. What does this mean? 

3. Find the effective working frequency for memory access in each of the following. 
(a) 40-MHz 386 with 1 WS (b) 50-MHz 486 with 1 WS 

4. Ifa given CPU has a read cycle time of 60 ns and 10 ns is used for the decoder and 
address/data path delay, how much is for memory access time? 

5. Ifa given system is designed with 1 WS and has a 90-ns memory cycle time, find 
the CPU's frequency if the read/write cycle time of this CPU is 2 clocks. 


ee enn SSK 
CHAPTER 22: HIGH-SPEED MEMORY DESIGN AND CACHE 561 


SECTION 22.2: PAGE AND STATIC COLUMN DRAMS 


To understand interfacing memory to high-performance computers, the different 
types of available RAM must first be understood. Although SRAMs are fast, they are 
expensive and consume a lot of power due to the use of flip-flops in the design of the 
memory cell, as we discussed in Chapter 10. At the opposite end of the spectrum is 
DRAM, which is cheaper but is slow (compared to CPU speed) and needs to be refreshed 
periodically. The refreshing overhead together with the long access time of DRAM is a 
major problem in the design of high-performance computers. The problem of the time 
taken for refreshing DRAM is minimal since it uses only a small percentage of bus time, 
but the solution to the slowness of DRAM is very involved. One common solution is using 
a combination of a small amount of SRAM, called cache (pronounced "cash"), along with 
a large amount of DRAM, thereby achieving the goal of near zero wait states. Before we 
discuss such solutions, we must understand what resources are available to high-perform- 
ance system designers. To this end, the different types of available DRAM will be dis- 
cussed, and cache memory is discussed in Section 22.3. First we clarify some widely used 
terminology such as memory cycle time and memory access time. 


Memory access time vs. memory cycle time 


Memory access time is defined as the time interval between the moment the 
addresses are applied to the memory chip address pins and the time the data is available 
at the memory's data pins. The memory data sheets refer to it as ty, (address access time). 
Another commonly used time interval is tc, (access time from CS), which is measured 
from the time the chip select pin of memory is activated to the time the data is available. 
In some cases, notably EEPROM, tog is the time interval between the moment OE 
(READ) is activated to the time the data is available. However, memory access time tig, 
is the one most often advertised. 

Memory cycle time is the time interval between two consecutive accesses to the 
memory chip. For example, a memory chip of 100 ns cycle time can be accessed no faster 
than 100 ns, which means that two back-to-back reads can be performed no faster than 
200 ns, and 3 back-to-back reads will take 300 ns, and so on. It must be noted that while 
in SRAM the memory cycle time is equal to memory access time, this is not so in DRAM 
memory, as discussed next. 


Types of DRAM 


There are different types of DRAM, which are categorized according to their 
mode of data access. These modes include standard mode, page mode, static column 
mode, and nibble mode. Although each mode is discussed separately below, often two of 
the above modes exist on the same DRAM chip. For example, page mode DRAM has 
standard mode as well. 


DRAM (standard mode) 


Standard mode (also called random access) DRAM, which has the longest mem- 
ory cycle time, requires the row address to be provided first and then the column address 
for each cell. Each group is latched in by the activation of RAS (row address select) and 
CAS (column address select) inputs, respectively. The access time is from the time that 
the row address is provided to the time that the data is available at the output data pin of 
the DRAM chip. This is the access time that is commonly advertised and is called trac 
(RAS access time, the access time from the moment RAS is provided). This is acceptable 
if we are accessing a random cell within DRAM. However, since most of the time data 
and code processed by the CPU are in consecutive memory locations and the CPU does 
not jump around to random locations (unless there is a JMP or CALL instruction), the 
DRAM will be accessed with back-to-back read operations. Unfortunately, DRAM can- 
not provide the code (or data) in the amount of time called tRAC if there is a back-to-back 
read from the same DRAM chip because DRAM needs a precharge time (tgp) after each 
RAS has been deactivated to get ready for the next access. This leads us back to the con- 


eee 


562 


cept of memory cycle time for DRAM memory chips. The memory cycle time for mem- 
ory chips is the minimum time interval between two back-to-back read/write operations. 
In SRAM and ROM, the access time and memory cycle time are always equal, but that is 
not the case for DRAMs. In DRAM, that after RAS makes the transition to the inactive 
state (going from low to high), it must stay high for a minimum of tgp (RAS precharge) to 
precharge the internal device circuitry for the next active cycle. Therefore, in DRAM we 
have the following approximate relationship between the memory access time and mem- 
ory cycle time. 


tec = trac + tap (This is for standard mode) 
read cycle time = RAS access time + RAS precharge time 


For example, if DRAM has an access time of 100 ns, the memory cycle time is 
really about 190 ns (100 ns access time plus 90 ns precharge time). To access a single loca- 
tion in such a DRAM, 100 ns is enough, but to access more than one successively, 190 ns 
is required for each access due to the precharge time that is needed internally by DRAM 
to get ready to access the next capacitor cell. Tables 22-2 and 22-3 show DRAM and 
SRAM memory cycle times, respectively. 


Table 22-2: DRAM Access Time vs. Cycle Time (4M x 1) 


|___DRAM__| RAS Access (trac) (ns) | Read Cycle (trc) (ns) RAS Precnage (trp) (ns) 


(Reprinted by permission of Motorola Corporation , Copyright Motorola Corp. 1993) 


Table 22-3: SRAM Access Time vs. Cycle Time 


aa eo 


IDT71258S25 


‘ah 


SRAM (IDT Product) _| 


(Reprinted by permission of Integrated Device Technology , Copyright IDT, 1993) 


The read cycle time not being equal to the access time is one of the major differ- 
ences between SRAM and DRAM. Although in SRAM the read cycle time is equal to the 
access time, in DRAM of standard mode the read cycle time is about twice the access time 
normally advertised (tycc). This could make a difference in the total time spent by the 
CPU to access memory. Look at Examples 22-2 and 22-3. From the above discussion and 


Compare the minimum CPU time needed to read 150 random memory locations of a given bank in 


each of the following. 
(a) DRAM with Tacc = 100 ns and Tc = 190 ns 


(b) SRAM of Tacc = 100 ns 


Solution: 
(a) DRAM requires 190 ns to access each location. Therefore, a total of 150 x 190 = 28,500 ns 


would be spent by the CPU to access all those 150 memory locations. 
(b) In the case of SRAM, the CPU spends only 150 x 100 ns = 15,000. This would have been need- 
ed since T access = T read cycle (tacc = tec). 


nn nnn en ———LVea2aaa=<—L——L 
CHAPTER 22: HIGH-SPEED MEMORY DESIGN AND CACHE 563 


Static RAM Timing 


address address valid 


In Static RAM (SRAM) 
trco= ‘ac 


t{ac-——> 


tRc= Read Cycle tac = Access Time 


Standard Mode DRAM Timing 


ae 


address ZX 1w X“Wyyxcolumn WY 


In DRAM tec approx. = 2*tRAC 


tRAC =access time from RAS tRC = read cycle time 


tCAC =access time from CAS RP = RAS precharge time 
Figure 22-1. DRAM vs. SRAM Timing 


Calculate the time to access 1024 random bits of a 1M x 1 chip if tac = 85 ns and trac = 165 ns. 
Solution: 


For standard mode (also called random) we have the following for reading 1024 bits: 
time to read 1024 random bits = 1024 x ta. = 1024 x 165 ns = 168,960 ns 


Example 22-2 we can conclude that for successive accesses of random locations inside the 
DRAM the CPU must spend a minimum of tg. time on each access. See Figure 22-1 for 
DRAM and SRAM timing. 


DRAM interfacing using the interleaving method 


One of the methods used to overcome the problem of precharge time in DRAMs 
is the interleaving method of DRAM interfacing. In this method, two sets of banks are 
placed next to each other and the CPU accesses each set of banks alternately. In this way 
the precharge time of one set of banks is hidden behind the access time of the other one. 
This means that while the CPU is accessing one set of banks, the other set is being 
precharged. Look at Figure 22-2. Assume that the 80386SX is working on 20 MHz fre- 
quency; therefore, the CPU has a memory cycle time of 100 ns. Using DRAM with access 
time of 70 ns and the precharge of 65 ns gives a DRAM cycle time of 135 ns (70 + 65 = 
135). This is much longer than the 100 ns provided by the CPU. Using interleaved mem- 
ory design can solve this problem. In this case when the 386SX accesses bank set A, it 
goes on to access bank set B while set A takes care of its precharge time. Similarly, when 
the CPU accesses set A, the set B banks will have time to precharge. 


000003 000001 
000007 000005 
00000B 000009 


Figure 22-2. Interleaved DRAM Organization 


Example 22-4 


Show the time needed to access all 1024 memory locations of Example 22-3 if the interleaved 
method of memory interfacing is used. 
Solution: 


In the interleaved method, since the precharge time of one bank is hidden behind the access time of 
the other bank, each memory location is accessed in tgac as far as the CPU is concerned; therefore, 
1024 x 85 = 87,040 ns is the total amount of time spent by the CPU to access 1024 locations. 


Interleaved drawback 


The major drawback of interleaved memory is memory expansion. In expanding 
the memory based on the interleaved method, a minimum of two sets of banks must be 
added every time additional memory is required. Look at Example 22-5. Many inexpen- 
sive embedded systems based on 386SX, 386DX, and 486SX of 16-25 MHz frequency 
use the interleaved memory design method to avoid using expensive cache memory with- 
out sacrificing performance. 


PESEE ae cea aaaaaanaaaaaaaa acca aaa 


CHAPTER 22: HIGH-SPEED MEMORY DESIGN AND CACHE 565 


Assume that we are using 1M x 1 DRAM organization in Figure 22-2. If each set is 4 megabytes, 
find the following. 
(a) the chip count (b) the minimum memory addition and the chip count 


Solution: 


(a) Assuming 1M x 9 for each bank where each bank takes care of 8 bits of data, there are 9 chips 
for every byte. That means a total of 36 DRAM chips for each set, or a total of 72 1M x 1 chips for 
the first 8 megabytes of interleaved memory. 

(b) From then on, any memory addition must be in multiples of 4 megabytes since each set needs 
2M; therefore, we need another 36 1M x 1 DRAM chips to raise the total memory of the system to 
12M. 


Example 22-6 ` 


A 386 embedded system has 1M of DRAM installed using the interleaved design method. Show the mem- 
ory organization and DRAM chip count assuming that only 256K x 1 and 256K x 4 DRAM chips are used. 


Solution: 

Since the 386SX has a 16-bit data bus, it uses 512K bytes for each set of A and B, or four banks of 
256K x 9, where each set consists of two banks of 256K x 9. Therefore, the total chip count is 12 
since each bank uses three chips (two 256K x 4 and one 265K x 1 for parity bit). This is shown as 
follows. 


DP 
256Kx1 27 DO 


2K OF. 00 256K am 


Example 22-7 


Show the minimum memory addition and the chip count for Example 22-6. Assume that the avail- 
able DRAM chips are 256K x 1 and 256K x 4. 


Solution: 
The minimum memory addition is 1M. Since we have two banks for each set of interleaved mem- 
ory, we have two 256K x 4 and one 256K x 1 for parity, which means three chips for each bank. 
Therefore, the minimum memory addition requires 12 chips, eight of which are 256K x 4 and four 
are 256K x 1 for parity bits, resulting in 1 megabyte. 


Page mode DRAM 


The storage cells inside DRAM are organized in a matrix of N rows and N 
columns. In reading a given cell, the address for the row (A1—An) is provided first and 
RAS is activated; then the address for the column (A1l—An) is provided and CAS is acti- 
vated. In DRAM literature the term page refers to a number of column cells in a given 
row. See Examples 22-8 and 22-9. 


ie 
566 


Example 22-8 


Show how memory storage cells are organized in each of the following DRAM chips. 
(a) 256K x 1 (b) IM x 1 (c) 4M x 1 


Solution: 


(a) The 256K x 1 has 9 address pins (A0-A8); therefore, cells are organized in a matrix of 29 x 29 
= 512 x 512, giving 512 rows, each consisting of 512 columns of cells. 

(b) 1024 x 1024 

(c) 2048 x 2048 


Example 22-9 


Assuming that the DRAMs in Example 22-8 are of page mode, show how each chip is organized 
into pages. Find the number of columns per page for (a), (b), and (c). 


Solution: 


(a) For IM x 1 we have 512 pages, where each page has 512 columns of cells. 
(b) 1024 pages, where each page has 1024 bits (columns). 
(c) 2048 pages each of 2048 bits 


The idea behind page mode is that since memory locations are accessed consec- 
utively in most situations, there is no need to provide both the row and column address for 
each location, as was the case in DRAM with standard timing. Instead, in page mode, first 
the row address is provided, RAS latches in the row address, and then the column address- 
es are provided and CAS toggles back and forth, latching in the column addresses until 
the last column of a given page is accessed. Then the address of the next row (page) is 
provided and the process is repeated. While the access time of the first cell is the standard 
access time using both row and column (tgac), the access time in accessing the second cell 
on the last cell of the same page (row) is much shorter. This access time is often referred 
to as tcac (T of column access). In page mode DRAM when we are in a given page, each 
successive cell can be accessed no faster than tp. (page cycle time). See Figure 22-3. 
Table 22-4 gives page mode timing parameters. In DRAM of page mode both the standard 
mode and page mode are supported. 


Table 22-4: Page Mode DRAM Timing Parameters (4M x 1) 


Page Mode Access Time from Read Cycle | Access Time from; Page Cycle 

DRAM RAS, trac (ns) Time, trc (ns) CAS, tcac (ns) Time, tec (ns) 
memasioo-6o| eo — | no | Sts | St 
MCM44100-70 
mcmaaioo-so| s | o | %2 | s% | 


(Reprinted by permission of Motorola Corporation , Copyright Motorola Corp. 1993) 


Static column mode 


Static column mode makes accessing all the columns of a given row much sim- 
pler by eliminating the need for CAS. In this mode, the first location is accessed with a 
standard read cycle where the row address is latched by RAS followed by the column 
address and what is called CS (chip select) clock. From then on, CS is incremented inter- 
nally. As long as RAS and CS remain low, the contents of successive cells appear at the 
data output pin of DRAM until the last column of a given row is accessed; then the 


Leer er eee ee ee ee ee ——————————————————————=E= 


CHAPTER 22: HIGH-SPEED MEMORY DESIGN AND CACHE 567 


RAS 


Static Column Mode 


Figure 22-3. DRAM Page and Static Column Modes 


process is moved to the next row. This means that the initial access time of the first cell is 
the standard access time (tg,c), but each subsequent column in that row is accessed in a 
time called t,, (access time from column address). Due to the fact that there is no setup 
and hold time for column address select (CAS), the use of static-column-mode DRAM 
lends itself to memory design of high-frequency systems. A large percentage of 80386 and 
higher processor computers use static column DRAM for main memory. 

In static column mode where the initial standard access time is tgac, When we are 
in a given page, any cell can be accessed with the access time of t,,, but all the succes- 
sive bits can be accessed no faster than t,, (static column cycle time). See 
Figure 22-3. Table 22-5 gives static column mode timing parameters. 

Comparing Examples 22-10 and 22-11, if the time spent by the CPU is the same 
for both the page mode and static column mode, what is the advantage of static column 
mode? The answer is that static-column-mode DRAM design is simpler since there is no 
circuit or timing requirement for the CAS pin. Notice in Figure 22-3 that we need to keep 
both RAS and CS (chip select) low in order to access successive cells. Here is what 
Motorola (now Freescale) Application Note AN986 says about the superiority of the stat- 
ic-column-mode DRAM: "This mode is useful in applications that require less noise than 


Se eee eee 
568 


Table 22-5: Static Column DRAM Timing Parameters (4M x 1) 


T RAS Access, T Read Cycle, |T Column Access,| Cycle Time, 
trac (ns) trc (ns) taa (ns) tsc (ns) 


._! 2 ee 


(Reprinted by permission of Motorola Corporation , Copyright Motorola Corp., 1993) 


Example 22-10 


Calculate the total time spent by the CPU to access an entire page of memory if the memory banks 
are page mode DRAM of 1M x 1 with tg, =165 ns, trac = 85 ns, and tpc = 50 ns. 


Solution: 
For page mode we have the following for reading 1024 bits: 


Time to read 1024 bits of the same page = trac + 1023 x trc 
= 85 ns + 1023 x 50 ns = 51,235 ns 


Example 22-11 


Calculate the total time spent by the CPU to access the entire page of memory if the memory banks 
are static-column-mode DRAMs of IM x 1 with tac = 165 ns, trac = 85 ns, and tsc = 50 ns. 


Solution: 


For static column mode we have the following for reading 1024 bits: 
time to read 1024 bits of the same page = trac + 1023 x tsc 


= 85 ns + 1023 x 50 ns = 51,235 ns 


page mode. Output buffers are always on when the device is in this mode and the CS clock 
is not cycled, resulting in fewer transients and simpler operation.... Static column consists 
of changing column addresses while holding the RAS and CS clocks active." 


Timing comparison of DRAM modes 


A summary of DRAM timing is given in Table 22-6. Much of this material is 
taken from Motorola (Freescale) Application Note AN986. 


Table 22-6: Timing for 1M x 1 85 ns DRAM Chip 


Static Column 
85 


Read cycle time, trc | 
Page mode cycle time, trc —_ | wa. © 
Static column time, ts a ee ee 


(Reprinted by permission of Motorola Corporation , Copyright Motorola Corp. 1993) 


CHAPTER 22: HIGH-SPEED MEMORY DESIGN AND CACHE 569 


This concludes the discussion of DRAM operation modes. It must be noted that 
in many systems one of the above modes is implemented in order to eliminate the need 
for the wait state to access every bit of DRAM. As seen from the above discussion, even 
the best of any of these modes still cannot eliminate the need for the wait state entirely 
unless SRAM is used for the entire memory, which is prohibitively expensive. The best 
solution is to use a combination of SRAM and DRAM, which is discussed next. 


Review Questions 


1. In which type(s) of memory is the read cycle time equal to the memory access 
time? 

2. A given DRAM is advertised to have an access time of 50 ns. What is the approxi- 
mate memory cycle time for this DRAM? 

3. A given DRAM has a 120-ns memory read cycle time. What is its access time 
(trac)? 

4. In DRAM, a read cycle consists of and : 

5. Assume an 80386 of interleaved memory with 2M bytes initial DRAM for each of 
the following. 
(a) Show how the banks are organized. 
(b) What is the minimum memory addition? 

6. True or false. In page mode, the initial read takes trac. 

7. For page mode DRAM, while we are in a given page, we can access successive 
memory locations no faster than ; 

8. Calculate the time the CPU must spend to access 100 locations all within the same 
page if trac = 60 ns and tpc = 30 ns. 

9. The higher the system frequency, the less noise can be tolerated in the system. 
Which is preferable in a 20-MHz system, static column or page mode DRAM? 


SECTION 22.3: CACHE MEMORY 


The most widely used memory design for high-performance CPUs implements 
DRAMs for main memory along with a small amount (compared to the size of main mem- 
ory) of SRAM for cache memory. This takes advantage of the speed of SRAM and the 
high density and cheapness of DRAM. As mentioned earlier, to implement the entire 
memory of the computer with SRAM is too expensive and to use all DRAM degrades per- 
formance. Cache memory is placed between the CPU and main memory. See Figure 22- 
4. 

When the CPU initiates a memory access, it first asks cache for the information 
(data or code). If the requested data is there, it is provided to the CPU with zero wait 
states, but if the data is not in cache, the memory controller circuitry will transfer the data 
from main memory to the CPU while giving a copy of it to cache memory. In other words, 
at any given time the cache controller has knowledge of which information (code or data) 
is kept in cache; therefore, upon request for a given piece of code or data by the CPU the 
address issued by the CPU is compared with the addresses of data kept by the cache con- 
troller. If they match (hit) they are presented to the CPU with zero WS, but if the needed 
information is not in cache (miss) the cache controller along with the memory controller 
will fetch the data and present it to the CPU in addition to keeping a copy of it in cache 
for future reference. The reason a copy of data (or code) fetched from main memory is 
kept in the cache is to allow any subsequent request for the same information to result in 
a hit and provide it to the CPU with zero wait states. If the requested data is available in 
cache memory, it is called a hit; otherwise, if the data must be brought in from main mem- 
ory, it is a miss. 

In most computers with cache, the hit rate is 85% and higher. By combining 
SRAM and DRAM, cache memory's access time matches the memory cycle of the CPU. 
In the 80386/486 microprocessor with a frequency of 33 MHz and above, the use of cache 
EEE 


570 


Address Bus 


CPU Secondary Ta 
(plus on-chip cache ih Hard disk 


cache for (SRAM) lea haat 
486, Pentium) (DRAM) 


Data Bus 


Figure 22-4. CPU and Its Relation to Various Memories 


is absolutely essential. For example, in the 33-MHz 80386-based computer with only a 
60-ns read cycle time, only static RAM with an access time (cycle time) of 45 ns can pro- 
vide the needed information to the CPU without inserting wait states. We have assumed 
that 15 ns (60 — 45 = 15) is used for the delay associated with the address and data path. 
To implement the entire 16M of main memory of a 33-MHz 386/486 system with 45 ns 
SRAM is not only too expensive but the power dissipation associated with such a large 
amount of SRAM would require a complex cooling system used only for expensive mini- 
and mainframe computers. The problem gets worse if we use a 486 of 50 MHz or a 
Pentium of 60 MHz. 

It must be noted that when the CPU accesses memory, it is most likely to access 
the information in the vicinity of the same addresses, at least for a time. This is called the 
principle of locality of reference. In other words, even for a short program of 50 bytes, the 
CPU is accessing those 50 memory locations from cache with zero wait states. If it were 
not for this principle of locality and the fact that the CPU accesses memory randomly, the 
idea of cache would not work. This implies that JMP and CALL instructions are bad for 
the performance of cache-based systems. The hit rate, the number of hits divided by the 
total number of tries, depends on the size of the cache, how it is organized (cache organ- 
ization), and the nature of the program. 


Cache organization 


There are three types of cache organization: 


1. fully associative 
2. direct mapped 
3. set associative 


The following is a discussion of each organization with its advantages and disad- 
vantages. For the sake of clarity and simplicity, an 8-bit data bus and a 16-bit address bus 
are assumed. 


Fully associative cache 


In fully associative cache, only a limited number of bytes from main memory are 
held by cache along with their addresses. The SRAMs holding data are called data cache 
and the SRAMs holding addresses of the data are called tag cache. This discussion 
assumes that the microprocessor is sending a 16-bit address to access a memory location 
that has 8 bits of data and that the cache is holding 128 of the possible 65,536 (216) loca- 
tions. This means that the width of the tag is 16 bits since it must hold the address, and 
that the depth is 128. When the CPU sends out the 16-bit address, it is compared with all 


a 
CHAPTER 22: HIGH-SPEED MEMORY DESIGN AND CACHE 571 


128 addresses kept by the tag. If the address of the requested data matches one of the 
addresses held by the tags, the data is read and is provided to the CPU (a hit). If it is not 
in the cache (a miss), the requested data must be brought in from main memory to the CRU 
while a copy of it is given to cache. When the information is brought into cache, the con- 
tents of the memory locations and their associated addresses are saved in the cache (tag 
cache holds the address and data cache holds the data). 

In fully associative cache, the more data that is kept, the higher the hit rate. An 
analogy is that the more books you have on a table, the better the chance of finding the 
book you want on the table before you look for it on the book shelf. The problem with 
fully associative is that if the depth is increased to raise the hit rate, the number of com- 
parisons is too time consuming and inefficient. For example, a fully associative cache with 
a depth of 1024 requires 1024 comparisons, and that is too time consuming even for fast 
comparators. On the other hand, with a depth of 16 the CPU ends up waiting for data too 
often. This is because the operating system is swapping information in and out of cache, 
since its size is too small, and it must save the present data in the cache before it can bring 
in new data. This replacement policy is discussed later. In the above example of 128 depth, 
the amount of SRAM for tag is 128 x 16 bits and 128 x 8 for data, that is, 256 bytes for 
tag and 128 bytes for data cache for a total of 384 bytes. Although the above example used 
a total of 384 bytes of SRAM, it is said that the system has 128 bytes of cache. In other 
words, the data cache size is what is advertised. The SRAM inside the cache controller 
provides the space for storing the tag bits. Tag bits are not included in cache size. In Figure 
22-5, DRAM location F992 contains data 85H. The left portion of the figure shows when 
the data is moved from DRAM to cache. 


Fully Associative A15 A 


Tag Cache Data Cache DRAM Main Memory 


Tag Cache = 128x16 Data Cache = 128x8 


Figure 22-5. Fully Associative Cache 
Direct-mapped cache 


Direct-mapped cache is the opposite extreme of fully associative. It requires only 
one comparison. In this cache organization, the address is divided into two parts: the index 
and the tag. The index is the lower part of the address, which is directly mapped into 
SRAM, while the upper part of the address is held by the tag SRAM. From the above 
example, AO to A10 are the index and A11 to A15 are the tag. Assuming that CPU address- 
es location F7A9H, the 7A9 goes to the index but the data is not read until the contents of 
tag location 7A9 is compared with 11110B. If it matches (its content is 11110), the data is 
read to the CPU; otherwise, the microprocessor must wait until the contents of location 
F7A9 are brought from main memory DRAM into the CPU while a copy of it is issued to 
cache for future reference. There is only one unique location with index address of 7A9, 
but 32 possible tags (25 = 32). Any of these possibilities, such as C7A9, 27A9, or 57A9, 
could be in tag cache. In such a case, when the tag of a requested address does not match 
SO 


572 


the tag cache, a cache miss occurs. Although the number of comparisons has been reduced 
to one, the problem of accessing information from locations with the same index but dif- 
ferent tag, such as F7A9 and 27A9, is a drawback. The SRAM requirement for this cache 
is shown below. While the data cache is 2K bytes, the tag requirement is 2K x 5 = 10K 
bits or about 1.25K bytes. See Figure 22-6. 


Direct Mapped A15 A11;}A10 AO 


TAG INDEX 
Data Cache 


Tag Cache 


D7 Do 
Data Cache = 2Kx8 (2K bytes for data cache ) 


Figure 22-6. Direct-Mapped Cache 
Set associative 


This cache organization is in between the extremes of fully associative and direct 
mapped. While in direct mapped there is only one tag for each index, in set associative, 
the number of tags for each index is increased, thereby increasing the hit rate. In 2-way 
set associative, there are two tags for each index, and in 4-way there are 4 tags for each 
index. See Figures 22-7 and 22-8. Comparing direct-mapped and 2-way set associative, 
one can see that with only a small amount of extra SRAM, a better hit rate can be 
achieved. In this organization, if the microprocessor is requesting the contents of memo- 
ry location 41E6H, there are 2 possible tags that could hold it, since cache circuitry will 
access index 1E6H and compare the contents of both tags with "0100 00". If any of them 
matches it, the data of index location 1E6 is read to the CPU, and if none of the tags 
matches "0100 00", the miss will force the cache controller to bring the data from DRAM 
to cache, while a copy of it is provided to the CPU at the same time. In 4-way set asso- 
ciative, the search for the block of data starting at 41E6 is initiated by comparing the 4 
tags with "0100 000", which will increase the chance of having the data in the cache by 
50%, compared with 2-way set associative. As seen in the above example, the number of 
comparisons in set associative depends on the degree of associativity. It is 2 for 2-way set 
associative, 4 for 4-way set associative, 8 for 8-way, n for n-way set associative, and in 
the thousands for fully set associative. The higher the set, the better the performance, but 
the amount of SRAM required for tag cache is also increased, making the 8-way and 16- 
way associatives' increased costs unjustifiable compared to the small increase in hit rate. 
The increase in the set also increases the number of tag comparisons. Most cache systems 
that use this organization are implemented in 4-way set associative (e.g., 80486 on-chip 
cache). 

i From a comparison of these two cache organizations, the difference between them 
in organization and SRAM requirements can be seen. In 2-way, the tag of 1K x 6 and data 
of 1K x 8 for each set gives a total of 14K bits [2 x (IK x 6+ 1K x 8) = 28K bits]. In 4- 
way, there is 512 x 7 for the tag and 512 x 8 for data, giving a total of 32K bits [(512 x 
7+512 x 8) x 4=32K bits] of SRAM requirement. Only with an extra 4K bits will 
the hit rate improve substantially. As the degree of associativity is increased, the size of 
the index is reduced and added to the tag and this increases the tag cache SRAM require- 
ment, but the size of data cache remains the same for all cases of direct map, 2-way, and 
4-way associative. These concepts are clarified further in Examples 22-12, 22-13, and 22- 
14. 


Cee eee eee eee EEE 
CHAPTER 22: HIGH-SPEED MEMORY DESIGN AND CACHE 573 


A15 A10| A9 AO 
2-way Set Associative 


TAG INDEX 
4 —_ 2 5 1 


M RAM Main Memory 


SEE 


[Tag = 1Kx6 Data = 1K x 8] for each set (2K bytes for data cache ) 


Figure 22-7. Two-way Set Associative 


4-way Set Associative A15 A9| A8 AO 


TAG INDEX 


DRAM 
Set 4 Set 3 Set 2 Set 1 Main Memory 


A0 No 
Tag Data Tag Data Tag} |Data Tag] |Data 
A8 


A15 A9 D7 DO A15 A9 D7 DO A15 A9 D7 DO A15 A9 D7 DO 


[Tag = 512 x7 Data = 512 x 8] for each set (2K bytes for data cache ) 


Figure 22-8. Four-way Set Associative 


574 


This example shows directed-mapped cache for 16M main memory. 


Direct Mapped A23 A18! A17 AO 


TAG INDEX 


Tag Cache Data Cache DRAM Main Memory 


A23 A18 D7 DO 
Tag Cache = (2° x 6) / 8 = 192K bytes Data Cache = (2'ê x 8) / 8 = 256K bytes 


Example 22-13 


This example shows 2-way set associative mapped cache for 16M main memory. 


2-way Set Associative A23 A17| A16 AO 


TAG INDEX 
Set 2 Set 1 


| IDRAM Main Memory 


AO AO 
Tag Data Tag Data 
A16 
A23 A17 D7 DO A23 TANIT 


D7 DO D7 DO 


Tag Cache = 2{(2"’ x 7) / 8] = 224K bytes Data Cache =2{(2" x 8) / 8] = 256K bytes 


— ______s___. a. 
CHAPTER 22: HIGH-SPEED MEMORY DESIGN AND CACHE 575 


This example shows 4-way set associative mapped cache for 16M main memory. 


4-way Set Associative A23 A16| A15 AO 
TAG INDEX 
DRAM 
-O —_ 4 Sto 3 Set2 ——— 2 $t 1 Mann Merion 


T 


A23 A16 D7 DO A23 A16 D7 DO A23 A16 D7 DO A23 A16 D7 DO 


D7 DO 


Tag Cache = 4[(2"® x 8) / 8] = 256K bytes Data Cache = 4[(2"° x 8) / 8] = 256K bytes 


Updating main memory 


In systems with cache memory, there must be a way to make sure that no data is 
lost and that no stale data is used by the CPU, since there could be copies of data in two 
places associated with the same address, one in main memory and one in cache. A sound 
policy on how to update main memory will ensure that a copy of any new data written into 
cache will also be written to main memory before it is lost since the cache memory is 
nothing but a temporary buffer located between the CPU and main memory. To prevent 
data inconsistency between cache and main memory, there are two major methods of 
updating the main memory: (1) write-through and (2) write-back. The difference has to do 
with main memory traffic. 


Write-through 


In write-through, the data will be written to cache and to main memory at the 
same time. Therefore, at any given time, main memory has a copy of valid data contained 
in cache. At the cost of increasing bus traffic to main memory, this policy will make sure 
that main memory always has valid data, and if the cache is overwritten, the copy of the 
latest valid data can be accessed from main memory. See Figure 22-9. 


Write-back (copy-back) 


In the write-back (sometimes called copy-back) policy, a copy of the data is writ- 
ten to cache by the processor and not to main memory. The data will be written to main 
memory by the cache controller only if cache's copy is about to be altered. The cache has 
an extra bit called the dirty bit (also called the altered bit). If data is written to cache, the 
dirty bit is set to 1 to indicate that the cache data is new data that exists only in cache and 
not in main memory. At a later time, the cache data is written to main memory and the 
dirty bit is cleared. In other words, when the dirty bit is high it means that the data in cache 
has changed and is different from the corresponding data in main memory; therefore, the 


576 


Write-through 


Writes to both at the 
same time. 


Write-back 


Cache controller will 
write to main memory 
at a convenient time. 


Figure 22-9. Method of Updating Main Memory 


cache controller will make sure that before erasing the new data in cache, a copy of it is 
given to main memory. Getting rid of information in cache is often referred to as cache 
flushing. This updating of the main memory at a convenient time can reduce the traffic to 
main memory so that main memory buses are used only if cache has been altered. If the 
cache data has not been altered and is the same as main memory, there is no need to write 
it again and thereby increase the bus traffic as is the case in the write-through policy. See 
Figure 22-9. 

Before concluding this section, two more cache terminologies that are common- 
ly used in the technical literature will be described: cache coherency and cache replace- 
ment policy. 


Cache coherency 


In systems in which main memory is accessed by more than one processor (DMA 
or multiprocessors), it must be ensured that cache always has the most recent data and is 
not in possession of old (or stale) data. In other words, if the data in main memory has 
been changed by one processor, the cache of that processor will have the copy of the lat- 
est data and the stale data in the cache memory is marked as dirty (stale) before the proces- 
sor uses it. In this way, when the processor tries to use the stale data, it is informed of the 
situation. In cases where there is more than one processor and all share a common set of 
data in main memory, there must be a way to ensure that no processor uses stale data. This 
is called cache coherency. 


Cache replacement policy 


What happens if there is no room for the new data in cache memory and the cache 
controller needs to make room before it brings data in from main memory? This depends 
on the cache replacement policy adopted. In the LRU (least recently used) algorithm, the 
cache controller keeps account of which block of cache has been accessed (used) the least 
number of times, and when it needs room for the new data, this block will be swapped out 
to main memory or flushed if a copy of it already exists in main memory. This is similar 
to the relation between virtual memory and main memory. The other replacement policies 
are to overwrite the blocks of data in cache sequentially or randomly, or use the FIFO (first 
in, first out) policy. Depending on the computer's design objective and its intended use, 
any of these replacement policies can be adopted. 


Cache fill block size 


If the information asked for by the CPU is not in cache and the cache controller 
must bring it in from main memory, how many bytes of data are brought in whenever there 
is a miss? If the block size is too large (let's say 5000 bytes), it will be too slow since the 


ee ee een — 
CHAPTER 22: HIGH-SPEED MEMORY DESIGN AND CACHE 577 


main memory is accessed normally with 1 or 2 WS. At the other extreme, if the block is 
too small, there will be too many cache misses. There must be a middle-of-the-road 
approach. The block size transfer from the main memory to CPU (and simultaneous copy 
to cache) varies in different computers, anywhere between 32 and 512 bytes. If the block 
size is 32 bytes, then it is called the 8-line cache refill policy, where each line is 4 bytes 
of the 32-bit data bus. 


Level 1, 2, and 3 caches 


With advances in IC fabrication we can put hundreds of millions of transistors 
onto a single chip. This has allowed putting some caches on the CPU chip itself. When the 
cache is embedded into the CPU die, it is called L1 (level 1) cache. If the cache is on-chip, 
inside the package but outside of the CPU die, then it is called L2 (level 2) cache, where- 
as cache outside the CPU residing on the motherboard is called L3 (level 3). See Chapter 
24. 


Review Questions 


1. Cache is made of (DRAM, SRAM). 

2. From which does the CPU asks for data first, cache or main memory? 

3. Rank the following from fastest to slowest as far as the CPU is concerned. 
(a) main memory (b) register (c) cache memory 

4. In fully associative cache of 512 depth, there will be ___ comparisons for each 
data request. 

5. Which cache organization requires the least number of comparisons? 

6. A 4-way set associative organization requires comparisons. 

7. What does write-through refer to? 

8. Which one increases the bus traffic, write-through or write-back? 

9. What does LRU stand for, and how is it used? 

10. What does cache refill policy of 4 lines refer to? 


SECTION 22.4: SDRAM, DDR RAM, AND RAMBUS MEMO- 
RIES 


In recent years the need for faster memory has led to the introduction of some 
very high-speed DRAMs. In this section we look at three of them: EDO (extended data- 
out), SDRAM (synchronous DRAM), and RDRAM (Rambus DRAM). In the mid-1990s, 
the speed of x86 processors went over 100 MHz and subsequently Intel began talking 
about 300-400 MHz CPUs. However, a major problem for these high-speed CPUs is the 
speed of DRAM. After all, cache has to be filled with information residing in main mem- 
ory DRAM. Before we discuss some high-speed DRAMs, it needs to be noted that "1 
GHz" CPU does not mean that the bus speed is also 1 GHz. For microprocessors over | 
GHz, the bus speed is often a fraction of the CPU speed. This is due to the expense and 
difficulty (e.g., crosstalk, electromagnetic interference) associated with the design of high- 
speed motherboards and the slowness of memory and logic gates. For example, in many 
1 GHz Pentium systems, the bus speed is only 400 MHz. 


EDO DRAM: origin and operation 


Earlier in this chapter we discussed page mode DRAM. It needs to be noted that 
page mode DRAM has been modified and now is referred to as fast page mode DRAM. 
Note that DRAM data books of the mid-1990s refer only to fast page DRAM (FPM 
DRAM) and not page mode. The following describes the operation and limitations of fast 
page DRAM and how it led to EDO DRAM. 


1. The row address is provided and latched in when RAS falls. This opens the page. 
2. The column address is latched in when CAS falls and data shows up after teac has 


elapsed. However, the next column of the same row (page) cannot be accessed faster 


578 


than tpc (page cycle time). This means that accessing consecutive columns of opened 
pages is limited by the tpc. The tp, timing itself is influenced by how long CAS has to 
stay low before it goes up. Why don't DRAM designers puil up the CAS faster in 
order to shorten the tpc? This seems like a very logical suggestion. However, there is a 
problem with this approach in fast page mode: When the CAS goes high, the data out- 
put is turned off. So if CAS is pulled high too fast (to shorten the tpc), the CPU is 
deprived of the data. One solution is to change the internal circuitry of fast page 
DRAM to allow the data to be available longer (even if CAS goes high). This is exact- 
ly what happened. As a result of this change, the name EDO (extended data-out) was 
given to avoid confusion with fast page mode DRAM. This is the reason that EDO is 
sometimes called hyper-page since it is the hyper version of fast page DRAM. Tables 
22-7 and 22-8 show a comparison of FPM and EDO DRAM timing. Notice in both 
cases that all the parameters are the same except tpc. For the EDO version of page 
mode, the tpc is 10 ns less than fast page mode. 


Table 22-7: 70 ns 4M Table 22-8: 60, 50 ns 4M 
DRAM Timing DRAM Timing 


N 


b ma 
Speed (ns) 
tRAC (ns) 


oO 


tRC (ns) | 130 tRC (ns) |11 
tPC (ns) tPC (ns) 
Note: 256Kx16 DRAM Note: 256Kx16 DRAM 
From Micron Technology From Micron Technology 


In examining tp, timing in Figure 22-10, notice that tpc (page cycle time) consists 
of two portions: tcp (CAS precharge time) and teas (CAS pulse width). The tcp is similar 
across 70 ns, 60 ns, and 50 ns DRAMs of FPM and EDO (about 10 ns). It is tcas that 
varies among these DRAMs. In EDO this portion is made as small as possible. Figure 
22-11 compares FPM and EDO timing. 


tec = page cycle time 


tcp = CAS precharge 
tcas = CAS pulse width 


tcas 
tec = tcp + tcas 


tcp is the same in FPM and EDO; however, tcas is shorter in EDO 
Figure 22-10. tp. Timing in Page Mode DRAM 


SDRAM (synchronous DRAM) 


When the CPU bus speed goes beyond 75 MHz, even EDO is not fast enough. 
SDRAM is a memory for such systems. First, let us see why it is called synchronous 
DRAM. In all the traditional DRAMs (page mode, fast page, and EDO), CPU timing is 
not synchronized with DRAM timing, meaning that there is no common clock between 
the CPU and DRAM for reference. In those systems it is said that the DRAM is asynchro- 
nous with the microprocessor since the CPU presents the address to DRAM and memory 


LS — 
CHAPTER 22: HIGH-SPEED MEMORY DESIGN AND CACHE 579 


ba Ne a ëS 


EDO data — — — — amn > em en? 


Figure 22-11. Comparison of FPM and EDO Timing 


provides the data in the master/slave fashion. If data cannot be provided on time, the CPU 
is notified with the NOT READY signal. In response to NOT READY, the CPU inserts a 
wait state into its bus timing and waits until the DRAM is ready. In other words, the CPU 
bus timing is dependent upon the DRAM speed. This is not the case in synchronous 
DRAM. In systems with SDRAM, there is a common clock (called the system clock) that 
runs between the microprocessor and SDRAM. All bus activities (address, data, control) 
between the CPU and DRAM are synchronized with this common clock. That is, the 
common clock is the point of reference for both the CPU and SDRAM and there is no 
deviation from it and hence no waiting by the CPU. See Figure 22-12 for SDRAM tim- 
ing. As shown in Figure 22-12, the system clock is the common clock that the address, 
data, and control signals are synchronized with. As you examine the timing figures in 
EDO and page mode, you will not find such a clock. 


| | 


Some, 4a EAF ASARD ee ee ee 


address Z 4 ua. <n OuéireeoeeHr]HW]!!Hz 


| 
data ET 


| 
| 
| 
| 


Figure 22-12. SDRAM Timing 


580 


SDRAM, DDR RAM, and burst mode 


The presence of the common system clock between the CPU and SDRAM lends 
itself to what is called burst I/O. Although burst I/O will do both read and write, we will 
discuss the read operation for the sake of simplicity. In burst read, the address of the first 
location is provided as normal. RAS is first, followed by CAS. However, since we read 
several consecutive locations in the cache fill (depending on whether the cache has 4, 8, 
16, or 32 lines), there is no need to provide the full address of each line and pay the tim- 
ing penalty for address setup and hold time. Why not simply program the burst SDRAM 
to let it know how many consecutive locations are needed according to the cache design? 
That is exactly the idea behind many SDRAMs. They are capable of being programmed 
to output up to 256 consecutive locations inside DRAM. In other words, the number of 
burst reads can be 1, 2, 4, 8, 16, or 256, and burst SDRAM can be programmed in advance 
for any number of these reads. The number of burst reads is referred to as burst length. 
In many recent SDRAMs, the burst length can be as high as a whole page. Burst read 
shortens memory access time substantially. For example, if burst length is programmed 
for 8, for the first location we need the full address of RAS followed by CAS. However, 
for the second, third, ..., eighth, we can get the data out of the SDRAM with a minimum 
delay, limited only by the internal circuitry of DRAM. Starting with the Pentium, proces- 
sors use the concept of burst read in their bus timing. In DDR (Double Data Rate) RAM 
the data is provided to the CPU on both the positive and negative edges of the clock. 


SDRAM and interleaving 


In order to increase performance, SDRAMs use the concept of interleaving dis- 
cussed in Section 22.2. In traditional interleaved design, the board designer must arrange 
the DRAM in an interleaved fashion in order to hide the precharge time of one bank 
behind the access time of the other one. In SDRAM, this interleaving is done internally. 
In other words inside the SDRAM itself, DRAM cells are organized in such a way that 
while one bank is being refreshed the other one is being accessed. By incorporating both 
the burst mode and interleaving concepts into SDRAM, it is predicted that SDRAM mem- 
ory can be used for a bus frequency as high as 125 MHz but not beyond that. 

Figure 22-12 shows SDRAM timing. How many clocks after CAS will the data 
appear at the data pins? This can be programmed. It is called read latency and can be 1, 
2, or 3 clocks. In Figure 22-12, the read latency is 3 since the data appears at the data buses 
3 clocks after CAS. 

It should be noted that SDRAM and EDO standards are both set by industry and 
every DRAM maker supports them. SDRAM and EDO are not proprietary technologies. 


Rambus DRAM 


In contrast to EDO and SDRAM, Rambus is proprietary DRAM architecture. 
DRAM manufacturers license this technology from Rambus Inc. in exchange for royalty 
payments. DRAMs with Rambus technology are referred to as RDRAM in technical lit- 
erature. 


Overview of Rambus technology 


The heart of Rambus technology is a proprietary interface for chip-to-chip bus 
connection. This high-speed bus technology is composed of three sections: (1) a Rambus 
interface, (2) a Rambus channel, and (3) Rambus DRAM. The Rambus interface standard 
must be incorporated into both DRAM and the CPU. While many DRAM makers are 
introducing DRAM with a Rambus interface (called RDRAM), not every microprocessor 
is equipped with a Rambus interface. However, Intel has indicated that it will equip future 
generations of the x86 with a Rambus interface. However, if a given microprocessor is 
not equipped with a Rambus interface, one can design a memory controller with the 
Rambus interface and place it between the CPU and RDRAM. Such a controller is 
referred to as a Rambus channel master and the RDRAM is called a Rambus channel 
slave. See Figures 22-13 and 22-14. UOS 

In Rambus technology, only the master can generate a request since it contains 


eee eee ene eee Ve 
CHAPTER 22: HIGH-SPEED MEMORY DESIGN AND CACHE 581 


Assume a bus frequency of 100 MHz. Discuss bus timing for (a) EDO of 50 ns speed where tcp = 
20, (b) SDRAM of teg = 10 ns. 


Solution: 
1/100 MHz = 10 ns is the system clock. 


(a) In EDO when the page is opened, the fastest it can provide data is tpc, which is 20 ns. Therefore, 
we need at least one wait state. 

(b) In SDRAM of t,x = 10 ns, the first address is strobed into the DRAM and subsequent data bursts 
are provided at 10 ns intervals. Therefore, no wait state is needed. Of course, for both of the above 
cases any bus overhead was ignored. 


Ramb Controll ee 
channe or Core channel 
master processor slaves) 


ambus Interface 
Rambus Interface 


9-bit 
Data bus 
Control, V, 
GND 


Rambus Channel = 9 bits every 2 ns buses 


Figure 22-13. A Rambus-Based System 


Rambus 


traditional bus 
channel 


= 7 


Rambus traditional bus 
channel 


Figure 22-14. CPUs with and without Rambus Channel (Courtesy of Rambus, Inc.) 


intelligence. Slave devices such as RDRAM respond to requests by the master. This elim- 
inates any need for addition of intelligence circuitry to the RDRAM, thereby increasing 
its die size. This also means that data transfers can happen only between master and slave 
and there is never any direct data transfer between slaves. However, master capability can 
be added to devices other than the CPU such as peripheral devices, graphic processors, 
and memory controllers. The following describes additional features of Rambus channel 
technology. 


582 


1. The Rambus channel has only a 9-bit data bus. 

2. There are only two DRAM organizations available for RDRAMs: x 8 or x 9. For 
example, a 16M-bit RDRAM is organized as 2M x 8.This is in contrast to DIMM 
memory modules where there are 72 pins for data alone. Such a large number of 
pins without a sufficient number of ground pins limits bus speed for these memory 
designs to less than 100 MHz. To reduce crosstalk and EMI (electromagnetic inter- 
ference), we can add ground pins but that in turn makes the DRAM memory mod- 
ule too large (see Chapter 25 for the role of ground in reducing crosstalk). 

3. Since the data bus is limited to 9 pins in a Rambus channel, adding a sufficient 
number of ground pins can push the speed of the bus to 500 MHz. To counter the 
impact of a limited bus size on bus bandwidth, the Rambus employs the method of 
block transfer. This is explained next. 


Rambus protocol for block transfer 


In Rambus the data is transferred in blocks. Such a block-oriented data transfer 
requires a set of protocols in which the packet types are defined very strictly. There are 
three types of packets in the Rambus protocol: (a) request, (b) acknowledge, and (c) data. 
The following steps show how the read operation works according to Rambus protocol. 
See Figure 22-15. 


RDRAMs 
RDRAM 


controller 
channel 


x86 Processors 


Control 


Figure 22-15. x86 System Using Rambus DRAM (Courtesy of Rambus, Inc.) 


1. The master issues a request packet specifying the initial starting address of the 
needed data, plus the number of bytes needed to be transferred (the maximum for 
byte count is 256 bytes). This is considered one transaction. 

2. RDRAM receives the request packet and decodes the addresses and byte count. If it 
has the requested data, an acknowledge packet is sent back to the master. 

3. The acknowledge packet has three possibilities: 

(a) The addressed data does not exist. 

(b) The addressed data does exist but it is too busy to transfer the data. Try again 
later. This is called nack. 

(c) The addressed data does exist and it is ready to transfer them. This is called 
okay. 

4. Ifthe acknowledge packet has an okay in it, the RDRAM starts to transfer the data 
packet immediately. 


An interesting aspect of this protocol is that the delay associated with receiving 
the acknowledge and sending the data packets can be programmed into configuration reg- 
isters of both master and slave during BIOS system initialization. 

At the time of this writing, the Rambus technology has become the standard in 
x86 PC design. 


eS 
CHAPTER 22: HIGH-SPEED MEMORY DESIGN AND CACHE 583 


See the following websites for the latest in memory chip and design: 
http://www.micron.com/ 


http://www.rambus.com 


Review Questions 


1. A 200-MHz Pentium has a bus frequency of 
A 100-MHz Pentium has a bus frequency 2/3 of the CPU. What is the read cycle 
time for this processor? 


3. When a page is opened, what limits us in accessing consecutive columns? 
4. True or false. In EDO, when CAS goes up the data output is turned off. 
5, Which of the following DRAMs has a common synchronous clock with the CPU? 


(a) FPM (b) EDO (c)SDRAM (d)all of the above 
6. True or false. SDRAM incorporates interleaved memory internally. 
7. Can anyone incorporate the Rambus interface in their device? 
8. Who issues the request in a Rambus system? 
9. Who issues the acknowledge in a Rambus system? 
10. Can normal EDO or FPM DRAMs be used for the Rambus channel? 


PROBLEMS 


SECTION 22.1: MEMORY CYCLE TIME OF THE x86 


1. Calculate the memory cycle time for each of the following systems. 

(a) 386 of 33 MHz, 1 WS (b) 486 of 50 MHz, 1 WS 

(c) 386 of 25 MHz, 2 WS (d) Pentium of 60 MHz, 1 WS 

If the memory cycle time is 90 ns, a 386 system of 33 MHz needs __ WS. 

If the memory cycle time is 80 ns, a 486 system of 50 MHz needs __ WS. 

If the memory cycle time is 45 ns, a Pentium system of 66 MHz needs ___ WS. 

If the memory cycle time is 200 ns, a 386SX system of 20 MHz needs ___ WS. 

Find the effective memory performance of a 486 system of 50 MHz with 2 WS. 

Compare the performance degradation with a 0-WS system. 

7. Find the effective memory performance of a 386 system of 33 MHz with 1 WS. 
Compare the performance degradation with a 0-WS system. 

8. Find the effective memory performance of a Pentium system of 60 MHz with 1 
WS. Compare the performance degradation with a 0-WS system. 

9. Ifa given system with a 2-clock memory cycle time has a memory cycle time of 60 
ns and is designed with 1 WS, find the CPU frequency. 

10. In a33-MHz, 0-WS 486 system, a minimum of 20 ns is used for data and address 
path delay and address decoding. What is the maximum memory cycle time? 


DUAN 


SECTION 22.2: PAGE AND STATIC COLUMN DRAMS 


11. In which memory are the cycle time and access time equal? 

12. What is the difference between the t,, and tc, SRAM data sheet? 
13. Define the memory cycle time for a memory chip. 

14. Define the memory cycle time for the CPU. 


584 


15. What is tęc and trac in DRAM? State the difference between them. 

16. Show the relation (approximate) between tge, trac, and tgp. 

17. A given DRAM has tgac = 60 ns. What is the tg, (approximate)? 

18. A given DRAM has trac = 85 ns. What is the tg. (approximate)? 

19. A given DRAM has tec = 110 ns. What is the tg, (approximate)? 

20. A given DRAM has tgc = 90 ns. What is the tgac (approximate)? 

21. Calculate the time needed to access 2048 bits of 60-ns 4M x 1.Use Table 22-2. 

22. Calculate the time needed to access 2048 bits of 4M x 1 of 70 ns. Use Table 22-2. 

23. Draw a timing diagram for standard mode SRAM and DRAM memory cycle. 

24. What is the minimum memory addition to a 386 system with interleaved memory 
design if each bank of 8 bits is set for 1M x 8? The parity bit is not included. 

25. Calculate the chip count for Problem 24 if 1M x 4 chips are used. 

26. What is the minimum memory addition to a 486 system with interleaved memory 
design if each bank of 8 bits is set for 4M x 8? The parity bit is not included. 

27. Calculate the chip count for Problem 26 if 4M x 4 chips are used. 

28. Show the hex address for 386/486 interleaved memory banks. 

29. Calculate the time needed to access 2048 bits of one page for page mode DRAM of 
4M x 1 of 70 ns. Use Table 22-4. 

30. Calculate the time needed to access 2048 bits of one page for page mode DRAM of 
4M x | of 60 ns. Use Table 22-4. 

31. Calculate the time needed to access 2048 bits of one page for static column mode 
DRAM of the 4M x 1 of 70 ns. Use Table 22-5. 

32. Calculate the time needed to access 2048 bits of one page for static column mode 
DRAM of 4M x 1 of 60 ns. Use Table 22-5. 

33. True or false. In a Pentium of 1 GHz, the bus speed is also 1 GHz. 

34. True or false. In recent years, the bus speed has been falling behind the CPU speed. 


SECTION 22.3: CACHE MEMORY 


35. List the three different cache organizations. 

36. What is the principle of locality of reference? 

37. What does LRU stand for, and to what does it refer in cache memory? 

38. What do write-through and write-back refer to? Define each one and state an 
advantage and a disadvantage for each. 

39. What does a line size of 16 bytes mean? 

40. Calculate the tag and data cache sizes needed for each of the following cases if the 
memory requesting address to main memory is 20 bits (A19—A0). Assume a data 
bus of 8 bits. Draw a block diagram for each case. 

(a) fully associative of 1024 depth 

(b) direct mapped where A15—A0 is for the index 

(c) 2-way set associative where Al4—A0 is for the index 
(d) 4-way set associative (e) 8-way set associative 

41. In Problem 40, compare the size of data cache and tag cache parts (b), (c), (d), and 
(e). What is your conclusion? 

42. Calculate the tag and data cache sizes needed for each of the following cases if the 
memory requesting address to main memory is 24 bits (A23—A0). Assume a data 
bus of 8 bits. Draw a block diagram for each case. 

(a) fully associative of 1024 depth 

(b) direct mapped where A19—A0 is for the index 

(c) 2-way set associative where A18—A0 is for the index 

(d) 4-way set associative (e) 8-way set associative 
e 
CHAPTER 22: HIGH-SPEED MEMORY DESIGN AND CACHE 585 


43. In Problem 42, compare the size of data cache and tag cache for (b), (c), (d), and 
(e).What is your conclusion based on this comparison? 

44. Give three factors affecting the cache hit. 

45. What does the law of diminishing returns mean when applied to cache? 

46. Exaplin the difference betwwen the L1 and L2 cache. 

47. True or false. In L1 cache speed is the same as CPU die speed. 


SECTION 22.4: SDRAM, DDR RAM, AND RAMBUS MEMORIES 


48. The CPU speed (in Pentium and higher processors) is often a (multiple, 
fraction) of the bus speed. 

49. Calculate the memory read cycle time of a CPU with a bus speed of 300 MHz. 
Assume a 2-clock read cycle time. 

50. In the above question, discuss the difficulties associated with the design of such a ` 
high-speed bus. 

51. In Pentium processors with speeds of 100 MHz and higher, the bus speed is 


the CPU speed. 
(a) the same as (b) a fraction of (c) a multiple of 
52. In DRAM technology, EDO stands for and FPM stands for 


53. True or false. Both EDO and FPM are page mode DRAMs. 
54. What does "opening a page" mean in page mode DRAMs? What is the role of sig- 
nals RAS and CAS in opening a page? 


55. In (EDO, FPM) DRAM, the data is turned off when CAS goes high. 
56. In FPM DRAM, what happens if CAS goes high too soon and what is the conse- 
quence? 


57. When a page is opened, reading consecutive columns is limited by the speed of 


58. In the design of DRAM, why is it desired to pull CAS high as soon as possible? 
59. What is the tpc for a 50-ns DRAM? 


60. In a comparison of EDO and FPM DRAM of 60-ns and 70-ns speed, which timing 
parameters are the same and which are different? 


61. For EDO DRAM, tpc is normally 10 ns (less than, greater than) the tpc of 
FPM DRAM. 
62. The tpc timing is made of two parts. They are and . One of them 


is constant across all DRAMs of 70-ns, 60-ns, and 50-ns speeds. Which one is that? 
63. What does SDRAM stand for? 
64. What is the most important difference between SDRAM and traditional DRAMs of 


FPM and EDO? 

65. The SDRAM of 75 MHz can provide dataevery__ ns after a page has been 
opened. 

66. The SDRAM of 120 MHz can provide data every _ns after a page has been 
opened. 


67. It is predicted that SDRAM can be used for bus speeds as high as___ MHz. 

68. What is burst mode memory? Define burst length. 

69. In SDRAM, what is the size of burst length? 

70. True or false. The x86 processors starting with the 486 support burst mode read. 

71. What is the difference between interleaved memory design on board and inter- 
leaved in SDRAM? 

72. True or false. EDO and SDRAM memories are proprietary technology requiring 
licenses. 


ee oL 
586 


73. What does "RDRAM" stand for, and what is the difference between that and 
SDRAM and EDO DRAMs? 

74. Rambus technology consists of three parts. Name them. 

75. True or false. Both master and slave sections of Rambus technology must have a 
Rambus interface. 


76. A Rambus channel has a(n) -bit data bus. 

77. RDRAM isa (master, slave) in Rambus technology. 

78. True or false. In Rambus technology, the data transfer happens between master and 
slave only. 

79. True or false. In Rambus technology the data transfer never happens between 
slaves. 

80. Name and describe the three types of packets for communication protocol in 
Rambus technology. 


81. In the above question, "okay" and "nack" are part of which packet? 

82. Explain the role of nack and okay for data transfer in Rambus technology. 

83. The bus speed in Rambus can go as high as MHz. 

84. What happens if a CPU does not have a Rambus interface? 

85. True or false. EDO and SDRAM can be used in place of RDRAM in a Rambus 
channel. 


ANSWERS TO REVIEW QUESTIONS 


SECTION 22.1: MEMORY CYCLE TIME OF THE x86 


1. (a) 1/40 MHz = 25 ns; therefore, 2 x 25 = 50 ns; (b) 2 x 20 (for 0 WS) + 20 (1 WS) = 
60 ns; 
(c)2 x 15 ns + 15=45 ns 

2. It means that the CPU cannot access memory faster than every 50 ns. 

3. (a) The read cycle time is 75 ns; therefore, the effective working frequency is the 
same as 26.6 MHz of 0 WS (1/37.5 ns = 26.6 MHz). 
(b) The read cycle time is 60 ns; therefore, the effective working frequency is the 
same as 33 MHz (1/30 ns = 33 MHz). 

4. A total of 50 ns is left for the memory access time. 

5. Since 2+ 1 WS = 3 clocks for each read cycle time, 30 ns (90/3 = 30) for the CPU 
clock duration; therefore, the CPU frequency is 33 MHz (1/30 ns = 33 MHz). 


SECTION 22.2: PAGE AND STATIC COLUMN DRAMS 


SRAM and ROM 

100 ns 

60 ns 

trac (RAS access time), tgp (RAS precharge time) 

(a) There are two sets of 1M byte; therefore, each set consists of 4 banks of 256K x 
9 memory where each bank belongs to | byte of the D31—D0 data bus. 

(b) 2M 

True 


Ray ess 9 1S) 


tec 
Total time = trac + 99 X tpc = 60 + 99 x 30 = 3030 ns 


Static column 


oo ND 


pO 


CHAPTER 22: HIGH-SPEED MEMORY DESIGN AND CACHE 587 


SECTION 22.3: CACHE MEMORY 


SSS ae 


10. 


SRAM 

Cache 

Register, cache, and main memory 

sl 

Direct map 

4 

The CPU writes to cache and main memory at the same time when updating main 
memory. 

Write-through 

LRU (least recently used) is a cache replacement policy. When there is a need for 
room in the cache memory the cache controller flushes the LRU data to make room 
for new data. 

When the cache is filled with new data, it is done a minimum of 4 lines (4 x 4 = 16 
bytes) at a time. 


SECTION 22.4: SDRAM, DDR RAM, AND RAMBUS MEMORIES 


NO 


See 2 eS 


588 


Often less than 100 MHz; many times it is only 66 MHz. 
2/3 x 100 MHz = 66 MHz. Now 1/66 MHz = 15 ns. 2 x 15 ns = 30 ns read cycle 
time. 

The tpc (page cycle time) 

False 

SDRAM 

True 

Yes, as long as you get a license from Rambus Inc. 
Master (Rambus controller) 

RDRAM slave 

No. It must be RDRAM. 


CHAPTER 23 


PENTIUM AND RISC 
PROCESSORS 


OBJECTIVES 


Upon completion of this chapter, you will be able to: 


>> 


>> 
>> 


>> 
>> 
>> 


>> 
>> 


>> 


>> 
>> 


>> 


List the design enhancements of the x86 microprocessors from 80486 to 
Pentium 4 

Discuss the advantages of the 5-stage pipeline over the 2-stage pipeline 
Explain how the burst cycle is used to increase memory cycle times for 
read and write operations 

Compare the cache sizes of x86 processors from 486 to Pentium 4 

List three ways that designers can increase the processing power of a CPU 
List design enhancements of the Pentium over previous-generation x86 
microprocessors 

Describe the impact on performance of the 64-bit data bus of the Pentium 
Describe superscalar architecture and Harvard architecture and their use 
in the Pentium 

List the unique features of RISC architecture compared to CISC and 
describe the impact on processing speed and program development 
Describe the main features introduced or enhanced in the Pentium Pro 
Give an overview of how MMX (MultiMedia extension) technology is 
used in some Intel processors 

Describe which aspects of DSP (digital signal processing) were 
incorporated into MMX technology 

Code Assembly language instructions to identify the CPU 


589 


The 8088/86 microprocessor is the product of technology of the 1970s. Advances 
made in integrated circuit technology in the 1980s made ICs with 1 million transistors 
possible. This led to the design of some very powerful microprocessors. This chapter will 
look at Intel's 486 and Pentium I microprocessors and will examine the merit of RISC 
processors and their potential power. In Section 23.1 the 80486 microprocessor is studied. 
Intel's Pentium I is discussed in Section 23.2. Section 23.3 explores RISC processors, and 
their performance is compared with that of x86 CISC processors. Section 23.4 discusses 
the main features of Intel's Pentium Pro processor. Section 23.5 describes MMX 
(MultiMedia extension) technology used for sound and graphics. 


SECTION 23.1: THE 80486 MICROPROCESSOR 


The 80486 is the first 1-million-transistor microprocessor (actually, 1.2 million) 
packaged in 168-pin PGA packaging. It is not only compatible with all previous Intel x86 
microprocessors, but is also much faster than the 80386. When Intel went from the 286 
to the 386, register widths were increased from 16 bits to 32 bits. In addition, the external 
data bus size was increased from 16 bits to 32 bits, and the address bus became 32 bits 
instead of 24 bits as in the 80286. However, the 32-bit core of the 386 microprocessor is 
preserved in the 486 microprocessor. This is due to the fact that many studies have shown 
that 32-bit registers can take care of more than 95% of the operands in high-level lan- 
guages. Like the 386, the 486 has a 32-bit address bus and a 32-bit data bus. The data bus 
is DO-D31 and the address bus is A2—A31 in addition to BEO—-BE3, just as in the 386. In 
the design of the 486, Intel uses four times as many transistors as used in the 386 to 
enhance its processing power while keeping a 32-bit microprocessor. 


Enhancements of the 486 


The following are the ways the 486 is enhanced in comparison to the 386. 
Enhancement 1 


By heavily pipelining the fetching and execution of instructions, the 486 executes 
many of its instructions in only 1 clock cycle instead of in 3 clocks as in the 386. By using 
a large number of transistors, it splits the fetching and execution of each instruction into 
many stages, all working in parallel. This allows the processing of up to five instructions 
to be overlapped. Pipelining in the 486 will be discussed further at the end of this section. 


Enhancement 2 


By putting 8K bytes of cache with the core of the CPU all on a single chip, the 
486 eliminates the interchip delay of external cache. In other words, while in the 386 the 
cache is external, the 486 has 8K bytes of on-chip cache to store both code and data. 
Although the 486 has 8K bytes of on-chip cache, 128K to 256K bytes of off-chip cache 
are also present in many systems. Off-chip cache (level two) is commonly referred to as 
secondary cache, while on-chip cache is called first-level cache. The 8K on-chip cache of 
the 486 has 2-way set associative organization and is used for storing both data and code. 
It uses the write-through policy for updating main memory. 


Enhancement 3 


Intel used some of 1.2 million transistors to incorporate a math coprocessor on the 
same chip as the CPU. While in all previous x86 microprocessors the math coprocessor 
was a separate chip, in the 80486 the math coprocessor is part of a single IC along with 
the CPU. This reduced the interchip delay associated with multichip systems such as the 
386 and 387 but at the same time made the cost of a 80486 high compared to a 386 since 
the 80486 is in reality two chips in one: the main CPU and math coprocessor. For many 
people who did not need a math coprocessor this extra price was not justified. Therefore, 
Intel introduced the 80486SX, which is the main CPU, and a separate math coprocessor 
named 80487SX. 


590 


Enhancement 4 


Another major addition to the 486 is the use of 4 pins for data parity (DP), which 
allows implementation of parity error checking on the system board. The four pins DPO, 
DP1, DP2, and DP3 are bidirectional, and each is used for 1 byte of the D31—D0 data bus. 
When the 486 writes data it also provides the even-parity bit for each byte through the 
DPO-DP3 pins. When it reads the data it expects to receive the even parity bit for each 
byte on the DPO—DP3 pins. After comparing them internally, if there is a difference 
between the data written and the data read, it activates the pin PCHK (parity check) to 
indicate the error. This means that PCHK is an output pin while DPO—DP3 are bidirection- 
al I/O. It must be noted that inconsistency between data written and data read has no effect 
on the execution of code by the CPU. It is the responsibility of system designers to incor- 
porate error detection by using the PCHK pin in their designs. In the above discussion of 
parity, the word data is meant to refer to both code and data. Figure 23-1 shows the mem- 
ory organization of the 486. 


DP3 D31 D24 DP2 1023 D16 


Figure 23-1. 486 Memory Organization with DP0-DP3 


Enhancement 5 


Another enhancement of the 486 involves the burst cycle. The memory cycle time 
of the 486 with the normal zero wait states is 2 clocks. In other words, it takes a minimum 
of 2 clocks to read from or write to external memory or I/O. In this regard, the 486 is like 
the 386. To increase the bus performance of the 486, Intel provides an additional option 
of implementing what is called a burst cycle. The 486 has two types of memory cycles, 
nonburst (which is the same as the 386) and burst mode. In the burst cycle, the 486 can 
perform 4 memory cycles in just 5 clocks. The way the 80486 performs the burst cycle 
read is as follows. The initial read is performed in a normal 2-clock memory cycle time, 
but the next three reads are performed each with only one clock. Therefore, four reads are 
performed in only 5 clocks. This is commonly referred to as 2-1-1-1 read, which means 2 
clocks for the first read and 1 clock for each of the following three reads. This is in con- 
trast to 386, which is 2-2-2-2 for reading 4 doublewords of aligned data. Of course, burst 
cycle reading is most efficient if the data and codes are in 4 doubleword (32-bit) consec- 
utive locations. In other words, the burst cycle can be used to fetch a maximum of 16 bytes 
of information into the CPU in only 5 clocks, provided that they are aligned on double- 
word boundaries. There are two pins, BRDY (burst ready) and BLAST (burst last), used 
specifically to implement the burst cycle. BRDY is an input into the 486 and BLAST is 
an output from the 486. See Figure 23-2 and Example 23-1. 


Enhancement 6 


The 486 supports all 386 instructions in addition to six new ones. They are shown 
in Table 23-1. Three of the new instructions, INVD, INVLPG, and WBINVD, are added 
specifically for dealing with the on-chip cache and the TLB entries. The XADD instruc- 


eee ee ener enn NK 
CHAPTER 23: PENTIUM AND RISC PROCESSORS 591 


Figure 23-2. Burst Cycle Read in the 486 
Example 23-1 


Calculate and compare the bus bandwidth of the following systems. Assume that both are working 
with 33 MHz and that the 386 is 0 WS. Also assume that the data is aligned and is in 4 consecutive 
doubleword memory locations. 

(a) 386 (b) burst mode of the 486 


Solution: 


(a) In the 386, since each memory cycle time takes 2 clocks we have memory cycle time = 2 (1/33 
MHz) =2 x 30 ns = 60 ns bus bandwidth = (1/60 ns) x 4 bytes = 66 megabytes/second 

(b) In burst mode, the 486 performs 4 memory cycles in only 5 clocks; therefore, the average mem- 
ory cycle time in burst mode is 1.25 (5/4 = 1.25) clocks for each 32-bit (doubleword) of data fetched 
as long as they are aligned and located in consecutive memory locations. This results in bus band- 
width = [1/(1.25 = 30 ns)] 4 bytes = 106.66 megabytes/second 


Table 23-1: New 486 Instructions 


instruction [Meaning — ć 
Write back and invalidate cache 


Exchange and add 


592 


tion first loads the destination operand into the source and then loads the total sum of both 
the destination and the original source into the destination. The CMPXCHG instruction 
compares the accumulator, AL, AX, or EAX, with the desiination operand, which could 
be a register or memory. If they are equal, the ZF = 1 and the source are copied into the 
destination. If they are not equal, ZF = 0 and the destination are copied into the accumu- 
lator. For example, the instruction "CMPXCHG BX,CX" copies CX into BX only if CX 
= AX; otherwise, it copies BX into accumulator AX. 

As mentioned previously, some systems use little endian while others use the big 
endian convention of storing data. To allow the implementation of either, BSWAP is pro- 
vided. The BSWAP instruction converts the contents of a 32-bit register from the little 
endian to big endian, or vice versa. See Example 23-2. 


Example 23-2 


Find the contents of memory location ES:4000 after running the following program. 


MOV EAX,[ 2000] ;load EAX from memory DS:2000 
BSWAP EAX ;change little endian to big endian 
MOV ES:{ 4000] EAX s;save the result at ES:4000 


Assume that memory locations DS:2000—DS:2003 have the following contents: 


DS:2000 = (87) 
DS:2001 = (54) 
DS:2002 = (F2) 
DS:2003 = (99) 


Solution: 


The first instruction brings in the data in the little endian format where the least significant byte is 
fetched into the least significant byte of EAX, which is the AL register. BSWAP makes the 87H the 
most significant byte and puts 99H into the AL register. Therefore, after the execution of the last 
instruction we have the following: 


ES:4000 = (99) 
ES:4001 = (F2) 
ES:4002 = (54) 
ES:4003 = (87) 


The addition of the BSWAP instruction makes it much easier for operating system software writers 
to convert their software from little endian to big endian, or vice versa. 


CLK in the 80486 


Another difference is the clock frequency provided to the 486. As mentioned in 
Chapter 22, the CLK input frequency, which provides the fundamental timing for the 
internal working of the CPU, is twice the system frequency for 386 microprocessors. In 
the case of the 486, the CLK is the same as the system frequency. 


386, 486 performance comparison 


As stated earlier, most of the instructions in the 486 are executed with only one 
clock. See Example 23-3. 


ee Nene enn SE — 
CHAPTER 23: PENTIUM AND RISC PROCESSORS 593 


Example 23-3 


Compare the clock count for the loop part of the following program run on the 386 and 486. This 
program transfers a block of DWORD data. Assume that the block size is 10. 


MOV Cx 10 ;count=10 
MOV SI,OFFSET ARRAY1 ;load address of source 
MOV DL OFFSET jiRESUEL ;load address of destination 
MOV EAX, DWORD PTR [ SI] get the element 
MOV [ DII TEAX ;store it 
ADD SI,4 point to nextkelkemnment 
ADD DI, 4 ;point to next element of result 
DEC ex ;decrement the counter 
JNZ AGAIN randi go Dbackif not Zeng 
Solution: 
86 486 
AGAIN: MOV EAX,DWORD PTR [SI] 
MOV [DI],EAX 
ADD SI,4 
ADD DI,4 
DEC CX 
JNZ AGAIN 


Total for one iteration 


Notice the branch penalty for the JNZ instructions. If it goes back, it takes 7 clocks for the 386 and 
3 for the 486. If it falls through, it takes only 3 and 1 for the 386 and 486, respectively. Also notice 
that "MOV [DI],EAX" takes 2 clocks since EAX must be provided first by the previous instruction. 
This is called a data dependency. In the next section we compare 386 and 486 performance with that 
of the Pentium processor. 


More about pipelining 


In the 8085 there was no pipelining. At any given moment, it either fetched or it 
executed. It could not do both at the same time. In the 8085, while the buses were fetch- 
ing the instructions (opcodes) and data, the CPU was sitting idle, and in the same way, 
when the CPU was executing instructions, buses were sitting idle. However, in the 8086 
the fetch and execute were performed in parallel by two sections inside the CPU called 
the BIU (bus interface unit) and EU (execution unit). The 8086 has an internal queue 
where it keeps the opcodes that are prefetched and waiting for the execution unit to 
process them. In the sequence of instructions, if there is a jump (JMP, JNZ, JNC, and so 
on) or CALL, the prefetched buffer (queue) is flushed and the bus interface unit of the 
CPU brings in instructions from the target location while the the execution unit waits for 
the new instruction. Since the introduction of the 8086 in 1978, microprocessor designers 
have come to rely more and more on the concept of pipelining to increase the processing 
power of the CPU. The next development was to expand the concept of a pipeline to the 
three stages of fetch, decode, and execute. In the 486, the pipeline stage is broken down 
even further to 5 stages as follows: 


1. fetch (prefetch) 

2. decode ll 

3. decode 2 

4. execute 

5. register write-back 


eee 
594 


PF = prefetch 
D1 = decode 1 


[1 ]D2| Ex|Wwe . 
= = ate 2 
= execute 
WB = write back 


Each stage takes 1 clock, 


but when the pipeline is full 
each instruction will execute 


in a single clock. 


Figure 23-3. 486 Pipeline Stages 


Due to such a large number of addressing modes in the x86, a two-stage decoder 
is used for the calculation and protection check of operand addresses. The register write- 
back is the stage where the operand is finally delivered to the register. For example, after 
the instruction "ADD EAX,[EBX+ECX*8+200]" is fetched, the two decoding stages are 
responsible for calculating the physical address of the source operand, checking for a valid 
address, and getting it into the CPU. There the operand is added together with EAX dur- 
ing the execution stage, and finally, the addition result is written into EAX, the destina- 
tion register. Figure 23-3 shows the 486 pipeline. 


Review Questions 


1. How many pins does the 80486 have, and what kind of packaging is used for it? 
2. True or false. The 486 is a 32-bit microprocessor. 

3. The 80486 has a(n) -bit external and a(n) -bit internal data bus. 
4 

5 


State the difference between the 80486 and the 80486SX. 
On-chip cache is referred to as , while off-chip cache is called 


6. State the size of the on-chip cache for the 486 and the cache organization. 

7. Calculate the bus bandwidth of a 486 burst read for a 50-MHz system. 

8. If the 486 is advertised as 33 MHz, the clock frequency connected to the CLK pin 
is 


. Pin A20M is an (input, output) signal. 
10. A20 (the twentieth address bit) is an (input, output) signal for the 
486. 


i 


CHAPTER 23: PENTIUM AND RISC PROCESSORS 595 


SECTION 23.2: INTEL'S PENTIUM 


Intel put 3.1 million transistors on a single piece of silicon using a 273-pin PGA 
package to design the next generation of x86. It is called Pentium instead of 80586. The 
name Pentium was chosen to distinguish it from clones because it is hard to copyright a 
number such as 80586. There are three ways available to microprocessor designers to 
increase the processing power of the CPU. 


1. Increase the clock frequency of the chip. One drawback of this method is that the 
higher the frequency, the more the power dissipation and the more difficult and 
expensive the design of the microprocessor and motherboard. 

2. Increase the number of data buses to bring more information (code and data) into 
the CPU to be processed. While in the case of DIP packaging this option was very 
expensive and unrealistic, in today's PGA packaging this is no longer a problem. 

3. Change the internal architecture of the CPU to overlap the execution of more i 
instructions. This requires a lot of transistors. There are two trends for this option, 
superpipeline and superscalar. In superpipelining, the process of fetching and exe- 
cuting instructions is split into many small steps and all are done in parallel. In this 
way the execution of many instructions is overlapped. The number of instructions 
being processed at a given time depends on the number of pipeline stages, com- 
monly termed the pipeline depth. Some designers use as many as 8 stages of 
pipelining. One limitation of superpipelining is that the speed of the execution is 
limited to the the slowest stage of the pipeline. Compare this to making pizza. You 
can split the process of making pizza into many stages, such as flattening the 
dough, putting on the toppings, and baking, but the process is limited to the slowest 
stage, baking, no matter how fast the rest of the stages are performed. What hap- 
pens if we use two or three ovens for baking pizzas to speed up the process? This 
may work for making pizza but not for executing programs, since in the execution 
of instructions we must make sure that the sequence of instructions is kept intact 
and that there is no out-of-step execution. The difficulties associated with a stalled 
pipeline (a slowdown in one stage of the pipeline, which prevents the remaining 
stages from advancing) has made CPU designers abandon superpipelining in favor 
of superscaling. In superscaling, the entire execution unit has been doubled and 
each unit has 5 pipeline stages. Therefore, in superscalar, there is more than one 
execution unit and each has many stages, rather than one execution unit with 8 
stages as in the case of a superpipelined processor. In some superscalar processors, 
there are two execution units each with 4 pipeline stages instead of a single execu- 
tion unit with 8 pipeline stages as superpipelining proponents would have it. In 
other words, in superscaling we have two (or even three) execution units and as the 
instructions are fetched they are issued to the various execution units. Using the 
analogy of pizza, superscalar is like doubling or tripling the entire crew flattening 
the dough, putting toppings on, and baking. Of course, you will need a lot more 
people involved in the process and you have to have more ovens, but at the same 
time you are doubling or tripling the pizza output. In cases of recent microproces- 
sor architecture, a vast majority of designers have chosen superscaling over super- 
pipelining. This requires numerous transistors to duplicate several execution units, 
just like needing more people in our pizza-making analogy. Fortunately, advances 
in IC design have allowed designers access to a couple of million transistors to 
throw around for the implementation of powerful superscaling. There are some 
problems with superscaling, such as data dependency issues, which can be solved 
by the compiler, as we will discuss below. 


= ae 
596 


Code Cache Branch 
Prediction 
Prefetch 
Buffers 


Pipelined 
Floating-Point 


U-pipe V-pipe Unit 
ee |x 
Interface ALU ALU 


Register 
Set 


Multiply 


Add 


Figure 23-4. Inside the Pentium 


Intel used all three methods to increase the processing power of the Pentium I. 
Intel shipped the 60- and 66-MHz Pentium and eventually reached 3 GHz for Pentium IV, 
as well. The Pentium has a 64-bit external data bus and is a superscalar processor with two 
execution units to process integer data. This is in addition to a separate execution unit for 
floating-point data. 


Data Cache 


Features of the Pentium 


The following are some of the major features of the Pentium processor. 
Feature 1 


In the Pentium, the external data buses are 64-bit, which will bring twice as much 
code and data into the CPU as the 486. However, just like the 386 and 486, Pentium reg- 
isters are 32-bit. Bringing in twice as much as information can work only if there are two 
execution units inside the processor, and this is exactly what Intel has done. The Pentium 
uses 64 pins, DO-D63, to access external memory banks, which are 64 bits wide. DO—D7 
is the least significant byte, and D56—D63 is the most significant byte. Accessing 8 bytes 
of external data bus requires 8 BE (byte enable) pins, BEO-BE7, where BEO is for DO—D7, 
BE1 for D8—D15, and so on. This is shown in Figure 23-5 and Table 23-2. 

While in the 486 there were four DP (data parity) pins, one for each of the 4 bytes 
of the data bus, in the Pentium there are 8 DP pins to handle the 8 bytes of data pins 
D0—D63. The Pentium has A31 to A3 for the address buses. This is shown in Figure 23- 
6. Just like the 486, the Pentium also has the A20M (A20 Mask) input pin for the imple- 
mentation of HMA (high memory area). 


Feature 2 


The Pentium has a total of 16K bytes of on-chip cache: 8K is for code and the 
other 8K is for data. In the 486 there is only 8K of on-chip cache for both code and data. 
The data cache can be configured as write-back or write-through, but to prevent any acci- 


eee erence eo 
CHAPTER 23: PENTIUM AND RISC PROCESSORS 597 


Table 23-2: Pentium Byte Enable Signals 


Byte Enable 
Signal Associated Data Bus Signals 


D0-D7 (byte 0, the least significant 
D8-D15 (byte 1 

D16—D23 (byte 2 

D24-D31 (byte 3 

D32—D39 (byte 4 

D40-—D47 (byte 5 

D48—D55 (byte 6 

D56—D63 (byte 7, the most significant 


FFFFFFFFH FFFFFFF8H 


PHYSICAL 


MEMORY FFFFFFFFH FFFFFFF8H 
4G BYTES 


00000007H 00000000H 
BE7# BE6# BES# BE4#BE3# BE2# BE1# BEO# 


00000007H 64-BIT-WIDE MEMORY ORGANIZATION 


00000000H 


PHYSICAL MEMORY SPACE 


(Reprinted by permission of Intel Corporation , Copyright Intel Corp. 1993) 


Figure 23-5. Pentium Memory Organization 


TM 
PENTIUM PROCESSOR A31-A3, 64-BIT MEMORY 


BE7#-BE0# 


(Reprinted by permission of Intel Corporation , Copyright Intel Corp. 1993) 


Figure 23-6. Pentium Address Buses 


eee 
598 


dental writing into code cache, the 8K of code cache is write protected. In other words, 
while the CPU can read or write into the data cache, the code cache is write protected to 
prevent any inadvertent corruption. Of course, when there is a cache miss for code cache, 
the CPU brings code from external memory and stores (writes) it in the cache code, but 
no instruction executing in the CPU can write anything into the code cache. The replace- 
ment policy for both data and code caches is LRU (least recently used). 

Both the on-chip data and code caches are accessed internally by the CPU core 
simultaneously. However, since there is only one set of address buses, the external cache 
containing both data and code must be accessed one at a time and not simultaneously. 
Some CPUs, notably RISC processors, use a separate set of address and data pins (buses) 
for the data and another set of address and data buses for the code section of the program. 
This is called Harvard architecture and will be discussed in the next section. The Pentium 
accesses the on-chip code and data caches simultaneously using Harvard architecture, but 
not the secondary (external) off-chip cache and data. The Pentium's cache organization for 
both the data and code caches is 2-way set associative. Each 8K is organized into 128 sets 
of 64 bytes, which means 27 x 26 = 213 = 8192 = 8K bytes. Each set consists of 2 lines 
of cache, and each line is 32 bytes wide. 


Feature 3 


The on-chip math coprocessor of the Pentium is many times faster than the one 
on the 486. It has been redesigned to perform many of the instructions, such as add and 
multiply, ten times faster than the 486 math coprocessor. In microprocessor terminology, 
the on-chip math coprocessor is commonly referred to as a floating point unit (FPU) while 
the section responsible for the execution of integer-type data is called the integer unit (IU). 
The FPU section of the Pentium uses an 8-stage pipeline to process instructions, in con- 
trast to the 5-stage pipeline in the integer unit. See Figure 23-7. 


Feature 4 


Another unique feature of the Pentium is its superscalar architecture. A large num- 
ber of transistors were used to put two execution units inside the Pentium. As the instruc- 
tions are fetched, they are issued to these two execution units. However, issuing two 
instructions at the same time to different execution units can work only if the execution of 
one does not depend on the other one, in other words, if there is no data dependency. As 
an example, look at the following instructions. 


ADD EAX, EBX ;add EBX to EAX 

NOT EAX ;take 1's complement EAX 
INC DI ;increment the pointer 
MOV { DI] BBX ¿move out EBX 


In the above code, the ADD and NOT instructions cannot be issued to two execu- 
tion units since EAX, the destination of the first instruction, is used immediately by the 
second instruction. This is called read-after-write dependency since the NOT instruction 
wants to read the EAX contents, but it must wait until after the ADD is finished writing it 
into EAX. The problem is that ADD will not write into EAX until the last stage of the 
pipeline, and by then it is too late for the pipeline of the NOT instruction. This prevents 
the NOT instruction from advancing in the pipeline, therefore causing the pipeline to be 
stalled until the ADD finishes writing and then the NOT instruction can advance through 
the pipeline. This kind of register dependency raises the clock count from 1 to 2 for the 
NOT instruction. What if the instructions are rescheduled, as follows? 


ADD EAX, EBX ;add EBX to EAX 

INC DA ;increment the pointer 

NOT EAX ¿take 1's complement of EAX 
MOV [ DI] , EBX move out EBX 


If they are rescheduled as shown above, each can be issued to separate execution 

units, allowing parallel execution of both instructions by two different units of the CPU. 
e 
CHAPTER 23: PENTIUM AND RISC PROCESSORS 599 


Since the clock count for each instruction is one, just like the 486, having two execution 
units leads to executing two instructions by pairing them together, thereby using only one 
clock count for two instructions. In the case of the above program, if it is run on the 
Pentium it will take only 2 clocks instead of 4 as in the case of the 486 microprocessor, 
assuming that two instructions are paired together. This reordering of instructions to take 
advantage of the two internal execution units of the Pentium is the job of the compiler and 
is called instruction scheduling. Currently, compilers are being equipped to do instruction 
scheduling to remove dependencies. The role of the compiler to reschedule instructions in 
order to take advantage of the superscalar capability of the Pentium must be emphasized. 
The process of issuing two instructions to the two execution units is commonly referred 
to as instruction pairing. The two integer execution units of the Pentium are called "U" 
and "V" pipes. Each has 5 pipeline stages. While the U pipe can execute any of the instruc- 
tions in the x86 family, the V pipe executes only simple instructions such as INC, DEC, 
ADD, SUB, MUL, DIV, NOT, AND, OR, EXOR, and NEG. These simple instructions are 
executed in one clock as long as the operands are "REG,REG" or "REGIMM" and have 
no register dependency. For example, instructions such as "ADD EAX,EBX", "SUB 
ECX,2000", and "MOV EDX,1500" are simple instructions requiring 1 clock, but not 
"ADD DWORD PTR [EBX+EDI+500],EAX", which needs 3 clocks. 


Feature 5 


Branch prediction is another new feature of the Pentium. The penalty for jumping 
is very high for a high-performance pipelined microprocessor such as the Pentium. For 
example, in the case of the JNZ instruction, if it jumps, the pipeline must be flushed and 
refilled with instructions from the target location. This takes time. In contrast, the instruc- 
tion immediately below the JNZ is already in the pipeline and is advancing without delay. 
The Pentium processor has the capability to predict and prefetch code from both possible 
locations and have them advanced through the pipeline without waiting (stalling) for the 
outcome of the zero flag. The ability to predict branches and avoid the branch penalty 
combined with the instruction pairing can result in a substantial reduction in the clock 
count for a given program. See Example 23-4. 


Feature 6 


As discussed in Chapter 21, the 386/486 has a page size of 4K for page virtual 
memory. The Pentium provides the option of 4K or 4M for the page size. The 4K page 
option makes it 386 and 486 compatible, while the 4M page size option allows mapping 
of a large program without any fragmentation. The 4M page size in the Pentium reduces 
the frequency of a page miss in virtual memory. 


Feature 7 


As discussed in Chapter 21, the 386 (and 486) has only 32 entries for the TLB 
(translation lookaside buffer), which means that the CPU has instant knowledge of the 
whereabouts of only 128K of code and data. If the desired code or data is not referenced 
in the TLB, the CPU must go through the long process of converting the linear address to 
a physical address. The Pentium has two sets of TLB, one for code and one for data. For 
data, the TLB has 64 entries for 4K pages. This means that the CPU has quick access to 
256K (64 x 4K = 256K) of data. The TLB for the code is 32 entries of 4K page size. 
Therefore, the CPU has quick access to 128K of code at any given time. Combining the 
TLBs for the code and data, the Pentium has quick access to 384K (128 + 256) of code 
and data before it resorts to updating the TLB for the page miss. Contrast this to 128K for 
the 486. If the page size of 4M is chosen, the TLB for the data has 8 entries while the TLB 
for the code has 32 entries. 


Feature 8 


The Pentium has both burst read and burst write cycles. This is in contrast to the 
486, which has only the burst read. This means that in the 486 any write to consecutive 
doubleword locations must be performed with the normal 2 clock cycles. This is not the 
case in the Pentium. 


eee 
600 


COCER 

GACE 
Popea 
Ppi] pew 
Pro eda 


CACAL 
CACAL 
[or] pew 

[or] pews 
CACAL 


Figure 23-7. Pentium Pipeline 


Example 23-4 


Compare the clock count for the program in Example 23-3, run on a 486 and a Pentium. Assume 
that the compiler has done the code scheduling to allow instruction pairing for the Pentium. 
Solution: 

First compare the following rearranged code with the code in Example 23-3. Here we have resched- 
uled instructions "ADD SI,4" and "ADD BX,4" to avoid register dependency. This allows pairing 
of two instructions and issuing them to execution units of the Pentium. For example, the instructions 
"MOV EAX,DWORD PTR [SI]" and "ADD SI,4" are issued simultaneously, one to each execution 
unit. This results in the execution of both instructions in only one clock. 


AGAIN: MOV EAX, DWORD PTR [ SI] 
ADD SI,4 
MOV {[ DI] , EAX 
ADD DI,4 
DEC CX 
JNZ AGAIN 


Total clock count for one iteration: 3 


In the 486, the execution of "JNZ AGAIN" takes 3 clocks every time it jumps to AGAIN, but for 
the Pentium it takes only 1 clock since the CPU has predicted the branch, fetched it, and the instruc- 
tions at label AGAIN are in the pipeline advancing. This way, regardless of the outcome of the JNZ, 
both the instruction below JNZ and the first instruction at label AGAIN are in two separate 
pipelines, advancing. If ZF = 0, the other pipeline is trashed, and if ZF = 1 (the end of loop), the 
instruction below the JNZ is executed and the branch prediction pipeline is abandoned. In the above 
program each iteration takes only 3 clocks on the Pentium compared with 9 clocks in the 486. While 
branch prediction is performed by the internal hardware of the Pentium, instruction scheduling must 
be done by the compiler. 


-e 


CHAPTER 23: PENTIUM AND RISC PROCESSORS 601 


The Pentium has features that lend themselves to implementation of multiple 
microprocessors (multiprocessors) working together. It also has features called error 
detection and functional redundancy to preserve and ensure data and code integrity. 


Intel's overdrive technology 


To increase both the internal and external clock frequency of the CPU requires 
faster DRAM, high-speed motherboard design, high-speed peripherals, and efficient 
power management due to a high level of power dissipation. As a result, the system is 
much more expensive. To solve this problem, Intel came up with what is called overdrive 
technology, also referred to as clock doubler and tripler. The idea of a clock doubler or 
tripler is to increase the internal frequency of the CPU while the external frequency 
remains the same. In this way, the CPU processes code and data internally faster while 
the motherboard costs remain the same. For example, the 486DX2-50 uses the internal 
frequency of 50 MHz but the external frequency by which the CPU communicates with 
memory and peripherals is only 25 MHz. This allows the instructions stored in the queue 
of 486 to be executed at twice the speed of fetching them from the system buses. With the 
advent of 32- and 64-bit external buses, on-chip cache, and burst cycle reading (reading 
16 bytes in only 5 clocks), the amount of code and data fetched into the queue of the CPU 
is sufficient to keep the execution unit of CPU busy even if it is working with twice or 
three times the speed of external buses. This is the reason that Intel is designing proces- 
sors with clock triplers. In that case, if the CPU's external buses are working at the speed 
of 33 MHz, the CPU works at 99 MHz speed. The design of a system board of 33 MHz 
costs much less than that of a 100-MHz system board. With slower memory and periph- 
erals one can get instruction throughput of three times the bus throughput. As designers 
move to wider data buses, such as 128-bit-wide buses, the use of clock doublers and 
triplers is one way of keeping the system board cost down without sacrificing system 
throughput. The Intel 486DX4 is an example of a clock-tripler CPU. Note that "X4" does 
not mean that the external frequency is 4 times the internal frequency. 


Review Questions 


1. The Pentium chip has pins. 

2. The Pentium has data pins. 

3. True or false. The Pentium is a 32-bit processor. 

4. What is the total cache on the Pentium? How much is for data, and how much is 
for code? 

5. Which is write protected, data or code cache? 

6. True or false. The on-chip data and code cache are accessed simultaneously. 

7. True or false. The branch prediction task is performed by circuitry inside the 
Pentium. 

8. Why is the Pentium called a superscalar processor? 

. True or false. Instruction scheduling is done by circuitry inside the Pentium. 

10. True or false. The general-purpose registers of the Pentium are the same as those in 

the 386 and 486. 


SECTION 23.3: RISC ARCHITECTURE 


In the early 1980s a controversy broke out in the computer design community, but 
unlike most controversies, it did not go away. Since the 1960s, in all mainframe and mini- 
computers, designers put as many instructions as they could think of into the microinstruc- 
tions of the CPU. Some of these instructions performed complex tasks. An example is 
adjusting the result of decimal addition to get BCD nibble-type data. Naturally, micro- 
processor designers followed the lead of minicomputer and mainframe designers. Since 
these microprocessors used such a large number of instructions and many of them per- 


a 
602 


formed highly complex activities, they came to be known as CISC (complex instruction 
set computer). According to several studies in the 1970s, many of these complex instruc- 
tions etched into the brain of the CPU were never used by programmers and compilers. 
The huge cost of implementing a large number of instructions (some of them complex) 
into the microprocessor, plus the fact that more than 60% of the transistors on the chip are 
used by the instruction decoder, made some designers think of simplifying and reducing 
the number of instructions. As this concept was developed, it came to be known as RISC 
(reduced instruction set computer). 


Features of RISC 


The following are some of the features of RISC. It must be noted that recent CISC 
processors such as the Pentium have used some of the following features in their design. 


Feature 1 
RISC processors have a fixed instruction size. In a CISC microprocessor such as 


the x86, instructions can be 1, 2, or even 6 bytes. For example, look at the following 
instructions. 


Che ja 1-byte instruction 
SUB DX, DX ja 2-byte instruction 
ADD PAX, | SI+8} ja 5-byte instruction 
JMP FAR ja 5-byte instruction 


This variable instruction size makes the task of the instruction decoder very diffi- 
cult since the size of the incoming instruction is never known. In a RISC microprocessor, 
the size of all instructions is fixed at 4 bytes (32 bits). In cases where instructions do not 
require all 32 bits, they are filled with zeros. Therefore, the CPU can decode the instruc- 
tions quickly. This is like a bricklayer working with bricks of the same size as opposed to 
using bricks of variable sizes. Of course, it is much more efficient to use the bricks of the 
same size. 


Feature 2 


RISC uses load/store architecture. In CISC microprocessors, data can be manip- 
ulated while it is still in memory. For example, in x86 instructions such as "ADD 
[BX],AL", the microprocessor must bring the contents of the memory location pointed at 
by BX into the CPU, add it to AL, then move the result back to the memory location point- 
ed at by BX. In RISC, designers did away with this kind of instruction. In RISC, instruc- 
tions can only load from memory into registers or store registers into memory locations. 


Integer Registers Floating-Point Registers 


Figure 23-8. RISC Integer and Floating-Point Registers 


CHAPTER 23: PENTIUM AND RISC PROCESSORS 603 


There is no direct way of doing arithmetic and logic instructions between registers and 
contents of memory locations. All these instructions must be performed by first bringing 
both operands into the registers inside the CPU, then performing the arithmetic or logic 
operation, and then sending the result back to memory. This idea was first implemented 
by the CRAY 1 supercomputer in 1976 and is commonly referred to as load/store archi- 
tecture. 


Feature 3 


One of the major characteristics of RISC architecture is a large number of regis- 
ters. All RISC microprocessors have 32 registers, r0-r31, each 32 bits wide. See Figure 
23-8. Of these 32 registers, only a few of them are assigned to a dedicated function. For 
example, r0 is automatically assigned the value zero and no other value can be written to 
it. One advantage of a large number of registers is that it avoids the use of the stack to 
store parameters. Although a stack can be implemented on a RISC processor, it is not as 
essential as in CISC since there are so many registers available. It must be noted that RISC 
processors, in addition to 32 general-purpose registers, also have another 32 registers for 
floating-point operations. The floating-point register can be configured as 64-bit in order 
to handle double-precision operands. 


Feature 4 


RISC processors have a small instruction set. RISC processors have only the basic 
instructions such as ADD, SUB, MUL, DIV, LOAD, STORE, AND, OR, EXOR, SHR, 
SHL, CALL, and JMP. For example, there are no such instructions as INC, DEC, NOT, 
NEG, DAA, DAS, and so on. Since RISC has very few instructions, it is the job of the pro- 
grammer (compiler) to implement those instructions by using available RISC instructions. 
One example is an immediate load instruction such as "MOV AX,25 ", which does not 
exist in some RISC processors. Instead, some other instructions, such as the OR instruc- 
tion, can be used to implement an immediate move as shown in the following example for 
the given RISC processor. 


or 20r AED) T8 7OR 25 with rO and put result in r8 


In RISC syntax,the destination register is the last register, r8 in the above exam- 
ple. Since r0 is always zero, ORing any number with it will result in that number. The 
above example will place 25 in r8. Another example is that there is no INC (increment) 
command. The ADD instruction is used instead, as in the following example: 


add Tel AlS add 1 to r15 and place result in ris 


The limited number of instructions is one of the criticisms leveled at the RISC 
processor since it makes the job of Assembly language programmers much more tedious 
and difficult compared to CISC Assembly language programming. This is one reason that 
RISC is used more commonly in high-level language environments such as C rather than 
Assembly language environments. It is interesting to note that some defenders of CISC 
have called it "complete instruction set computer" instead of "complex instruction set 
computer" since it has a complete set of every kind of instruction. How many of them are 
used is another matter. The limited number of instructions in RISC leads to programs 
that are large, as Example 23-5 shows. Although this can lead to using more memory, this 
is not a problem since DRAM memory is so cheap. However, before the advent of semi- 
conductor memory in the 1960s, CISC designers had to pack as much action as possible 
into a single instruction. 


Feature 5 


At this point, one might ask, with all the difficulties associated with RISC pro- 
gramming, what is the gain? The most important characteristic of the RISC processor is 
the fact that more than 99% of its instructions are executed with only 1 clock, in contrast 
to CISC instructions. Although some instructions in the 80486 microprocessor are execut- 
ed with 1 clock, with the the use of RISC concepts in designing it, it is still a CISC proces- 


eee 
604 


Example 23-5 


In the x86, the NOT instruction performs the 1's complement operation, but RISC does not have 
such an instruction. How is the 1's complement operation performed in RISC? Show code to take 
the 1's complement of 25H on both RISC and x86. 


Solution: 


RISC has an EXOR instruction. If we EXOR the operand with all 1s, the operand is inverted. 
In RISC we have 


Or 25H CORES TORT 2Ssl Walielal s2(0) and put result inm sats) (r8 25H) 
or EEL, ely TS ORMEA TVi CAO and put result in r5 (r5 FFH) 
XOT igo Go, ro XORI Weth r5 and put result in r9 (a9 DAH) 


Since each instruction is 4 bytes (32-bit), the three instructions take a total of 12 bytes of memory. 
In the x86, a CISC-type processor, we have the following, which takes only 4 bytes of memory (to 
see, use DEBUG to assemble): 


MOV AL, 25H 
NOT AL 


sor for the reasons discussed above. Even the other 1% of the RISC instructions that are 
executed with 2 clocks can be executed with one clock cycle by juggling instructions 
around (code scheduling). Code scheduling is the job of the compiler. What did designers 
do with all those transistors saved using the RISC implementation? In the case of RISC 
processors, these extra transistors are used to implement the math coprocessor, powerful 
cache and cache controller, and a very powerful graphics processor all on a single chip. 
In many computers, such as 386-based systems, all these functions are performed by sep- 
arate chips. 


Feature G 


Since CISC has such a large number of instructions, each with so many different 
addressing modes, microinstructions (microcode) are used to implement them. The imple- 
mentation of microinstructions inside the CPU takes more than 60% of transistors in many 
CISC processors. However, in case of RISC, due to their small set of instructions, they are 
implemented using the hardwire method. Hardwiring of RISC instructions takes no more 
than 10% of the transistors. It is interesting to note that in the Pentium, a CISC processor, 
the V-pipe executes only simple instructions and is hardwired while the U-pipe executes 
any of the x86 instructions and uses microinstructions. 


Harvard and von Neumann architectures 


Every microprocessor must have memory space to store program (code) and data. 
While code provides instructions to the CPU, the data provides the information to be 
processed. The CPU uses buses (wire traces) to access the code ROM and data RAM 
memory spaces. The early computers used the same bus for accessing both the code and 
data. Such an architecture is commonly referred to as von Neumann (Princeton) architec- 
ture. That means for von Neumann computers, the process of accessing code or data could 
cause each to get in the other’s way and slow down the processing speed of the CPU, 
because each had to wait for the other to finish fetching. To speed up the process of pro- 
gram execution, some CPUs use what is called Harvard architecture. The Harvard archi- 
tecture has separate buses for the code and data memory. See Figure 23-9. That means that 
we need four sets of buses: (1) a set of data buses for carrying data into and out of the 
CPU, (2) a set of address buses for accessing the data, (3) a set of data buses for carrying 
code into the CPU, and (4) an address bus for accessing the code. See Figure 23-9. This 


eee ee een —————— 
CHAPTER 23: PENTIUM AND RISC PROCESSORS 605 


von Neumann Architecture 


Harvard Architecture 


Figure 23-9. von Neumann vs. Harvard Architecture 


is easy to implement inside an IC chip such as a Pentium where both code and data are 
internal (on-chip) and distances are on the micron and millimeter scale. But implement- 
ing Harvard architecture for system boards such as x86 PC-type computers is very expen- 
sive because the RAM memories that hold code and data are external to the CPU. Separate 
wire traces for data and code on the motherboard will make the board large and expen- 
sive. For example, a Pentium microprocessor with a 64-bit data bus and a 32-bit address 
bus will need about 100 wire traces on the motherboard if it is von Neumann architecture 
(96 for address and data, plus a few others for control signals of read and write and so on). 
But the number of wire traces will double to 200 if we use Harvard architecture. Harvard 
architecture will also necessitate a large number of pins coming out of the microproces- 
sor itself. For this reason Harvard architecture is not implemented in the world of PC and 
workstation motherboards. This is also the reason that microprocessors such as Pentium 
use Harvard architecture internally, but they still use von Neumann architecture when they 
access external memory. See Chapter 24. The von Neumann architecture was developed 
at Princeton University, while the Harvard architecture was the work of Harvard 
University. 


Comparison of sample program for RISC and CISC 


Since RISC has established itself as a dominant architecture, an example will be 
given of a program written for Intel's x86 CISC and a RISC processor. Then they will be 
compared. The next program example will compare total clocks for a program that trans- 
fers a block of 32-bit words from some memory location to another memory location. 
First, several points about the RISC processor must be discussed. 


1. In instructions such as "add 13,r5,r2", r3 is added to r5 and placed in r2. This is in 
contrast to the x86, in which the destination register is the first register. 

2. r0 is always equal to zero, regardless of the operation performed on it. 

3. The load instruction cannot be followed by the store instruction, which tries to use 
the value that is just being loaded: in other words, no read after write (RAW). 

4. Some instructions, such as branch instructions, are delayed, which means that the 


eee 
606 


next instruction after the branch will be executed since the pipeline already has 
fetched it before the branch is taken; therefore, if we cannot put a useful instruction 
after the branch, a NOP should be used. 

5. There are several ways of encoding the NOP instruction. One is “add r0,r0,r0", 
which adds rO to rO and places the result in r0; since rO is always zero, the instruc- 
tion does nothing but waste time. Another would be to use shift left, such as "shl 
r0,r0,r0". 

6. Some other instruction must be used to accomplish a MOV. 

7. The logic instruction is used to perform the compare job. 


Now look at the following RISC program, first written with total disregard of 
code scheduling and then written with code scheduling. The number of clocks for one 
round of loop is calculated in each case. 


;this is a program for a RISC processor to transfer a 

;block of 20 dwords (each 32-bit) from memory locations starting at 
;the address pointed at by r3 to memory locations pointed at by r4. 
7r2 is the counter 

;There is no code scheduling in the following example. 


clocks 

Or ZO, EO 12 ¿load the r2 with 20(count=20) 
Daear) rS Meed rS from content of mem loc O0tr3 1 
add ro, ro, ro ;NOP since r5 cannot be used by store il 
See ron Old) n Store rS Into men loc of Orra il 
zeer 4. tes}, 163 ‘point at the next dword source data il 
add 4,n4,r4 ;point at the next dword destin data il 
add -1.r2,r2 ¿decrement counter: r2=r2-1 il 
ie. el) r2, T2 pAScercondi tion code to high ast r2=0 1 
pieke bak 7go to bak if CC=0. execute next instruction 1 
adda ro, r0, ro ;NOP for delayed branch ii 
total clocks for one loop iteration 9 


In the above program, Id. and st.l are for 32-bit operands. For byte and 16-bit bit 
operands, they would be Id.b and st.b, Id.s and st.s, respectively. Next the same program 
is juggled around and the NOPs are removed for better performance. As mentioned in the 
last section, this juggling is called code scheduling. 


number of clocks 
ne 10), i610), 12 PLO SEDANS Were 


iosves Iela O (r3), r5 7; iL 
eyelel Wl , iesip eS) i Ï 
Sigg dk ses, ONA) 5 il 
aada 1, r2, E2 i il 
Cie iO), 122 eZ : il 
loyeve! te Joye ; IL 
adda 4,r4,r4 ; il 

total clock count for one iteration 7 


It is assumed that the above program is run on a RISC processor with no super- 
scalar capability. Using a superscalar RISC will cut the clock count to half, or 4 clocks for 
the same program. This is much better than the 386 and 486 microprocessors shown in 
Example 23-3. It is comparable with the Pentium as shown in Example 23-4. While the 


ee enn L—L— 


CHAPTER 23: PENTIUM AND RISC PROCESSORS 607 


Pentium uses 3.1 million transistors to achieve such an impressive performance, a RISC 
processor with the same performance level can be designed using less than | million tran- 
sistors. There is only one problem. It will not run the massive number software packages 
written for the x86 Windows PC. 


IBM/Freescale RISC 


IBM and Freescale (formerly Motorola) together have a RISC processor called 
the Power PC 60x. It uses only 2.8 million transistors, with a power consumption of 8.5 
watts versus 16 watts in the Pentium. Apple used the Power PC RISC for the last genera- 
tions of Macintosh computers. However, recently they switched to x86 processors. No PC 
maker has used the RISC processors, since RISC computers must emulate to run 
Windows based software rather than running native. In other words, while MS Windows 
runs native on the x86, the RISC processor or any non-x86 has no choice but to emulate. 
In addition to the Power PC 60x RISC processor, there are some other notable RISC 
processors vying for a share of the embedded market. Among them are ARM and PIC32. 

A comparison of the major characteristics of the Pentium and the Power PC 601 
is provided in Table 23-3. To take full advantage of the power of RISC, software devel- 
opers must write applications specifically for RISC, rather than using emulation. 


Before concluding this discussion of RISC processors, it is interesting to.note that 


Table 23-3: Pentium I vs. Power PC 601 


[Feature —i‘SC*iCC#éPOtm@|— 
Number of transistors (million 
Power dissipation at 66 MHz (watts 


Die size (mm 262 


Number of instructions issued per clock cycle 2 3 (one is FP 
Architecture Superscalar Superscalar 


RISC technology was explored by the scientists in IBM in the mid-1970s, but it was 
David Patterson of the University of California at Berkeley who in 1980 brought the mer- 
its of RISC concepts to the attention of computer scientists. 


Review Questions 


1. What do RISC and CISC stand for? 

True or false. The 386 executes the vast majority of its instructions in 3 clock 

cycles, while RISC executes them in 1 clock. 

3. RISC processors normally have ____ general-purpose registers, each bits. 

4. True or false. Instructions such as "ADD AX,[DI]" do not exist in RISC. 

5. What is the size of instructions in RISC? 

6. True or false. While CISC instructions are variable sizes, RISC instructions are all 
the same size. 

7. Which of the following operations do not exist for the ADD instruction in RISC? 
(a) register to register (b) immediate to register 
(c) memory to register 


eee 
608 


= 


8. How many floating-point registers do we have in RISC and the 80x87? 
Why can floating-point registers in RISC be configured as 64-bit? 

10. True or false. Harvard architecture uses the same address and data buses to fetch 
both opcode and data. 


SECTION 23.4: PENTIUM PRO PROCESSOR 


In this section we discuss the main features of Intel's Pentium Pro processor. 
Intel's Pentium Pro is the sixth generation of the x86 family of microprocessors. For this 
reason, early literature about this chip referred to it as P6. Intel officially calls this chip 
Pentium Pro to emphasize its superiority over the previous Pentium generation. Intel used 
5.5 million transistors to make the Pentium Pro. The first Pentium Pro introduced in 1995 
had a speed of 150 Mhz and consumed 23 watts of power at that speed. Since then, Intel 
has introduced Pentium Pro chips with higher speeds and various power consumption rat- 
ings. 

There are no major surprises in the Pentium Pro in the sense that it runs all the 
software written for the 8088/86, 286, 386, 486, and Pentium microprocessors and its 32- 
bit registers are exactly the same as those of the 386. In other words, the register size was 
not increased to 64 bits as has been done by some RISC processors such as Digital 
Equipment's Alpha chip. 

For the first time, Intel also attached level 2 (L2) cache to the Pentium Pro all on 
a single package but with two separate dies. This packaging is called dual cavity by Intel. 
The integration of a 256K-byte L2 cache with the processor into a single package reduces 
interchip delay between the L2 cache and the CPU. While such an integration cut memo- 
ry access delay, it also made many SRAM makers mad since they lost another chunk of 
PC business to Intel. Notice that the Pentium Pro CPU has only 16K bytes of L1 cache on 
the same die, just like the Pentium processor, while 256 KB (or 512 KB) L2 cache is on 
the separate die. In addition to the 5.5 million transistors used for the Pentium Pro CPU 
and its 16 KB L1 cache, the L2 cache uses over 10 million transistors depending on the 
size of L2 cache. See Table 23-4 for further comparison of Pentium and Pentium Pro 
processors. 


Table 23-4: Comparison of Pentium and Pentium Pro 


(a | Pentium _ | 
1995 

5.5 million 

387 

64 bits 64 bits 

32 bits 36 bits 

64GB 

64 TB 

8, 16, 32 bits 8, 16, 32 bits 

16K bytes (data 8K, code 8K 16K bytes (data 8K, code 8K 
Ceahea O OO o Etema o |O 256KB/512KB O 
Dn e T o a o ee ee 
Branch prediction a a aa 
a a a o 


SL — 


CHAPTER 23: PENTIUM AND RISC PROCESSORS 609 


Pentium Pro 


Instruction Fetch/ Dispatch/ Retire unit 
cache decode execute 
unit unit 


Instruction 
Pool 


Figure 23-10. Pentium Pro Instruction Execution 
Pentium Pro: internal architecture 


Intel finally yielded to the rise of RISC concepts in the design of the Pentium Pro. 
In the Pentium Pro, all x86 instructions brought into the CPU are broken down into one 
or more small and easy-to-execute instructions. These easily executable instructions are 
called micro-operations (uops) by Intel. This is similar to the concept in RISC except that 
in RISC architecture the instruction set is very simple and easy to execute, and the instruc- 
tions stored in memory are exactly the same as the ones inside the CPU. In contrast, Intel 
had to maintain code compatibility for the Pentium Pro with all previous x86 processors, 
all the way back to 8086. Therefore, Intel had no choice but to convert the x86 instruc- 
tions produced by the compiler/assembler into micro-operations internally inside the 
CPU. An interesting aspect of converting x86 instructions into micro-ops internally is that 
it uses what is called triadic instruction format. In triadic instruction format, there are two 
source registers and one destination register. An example of triadic format is "ADD 
R1,R5,R8" in which registers R1 and R5 are added together and placed in R8. The con- 
tents of source registers R1 and R5 are not altered. Contrast this with "ADD AX,BX" in 
which there is only one source register (BX). For more examples of the triadic instruction 
format, see Section 23.3 on the RISC processors. 

The use of a triadic instruction set in the Pentium Pro architecture means that a 
large number of registers inside the Pentium Pro are not accessible or visible to the pro- 
grammer. In other words, as far as the programmer of the Pentium Pro (or compiler) is 
concerned, only the traditional register set EAX, EBX, ECX, and so on is available and 
visible. This ensures that compatibility with previous generations of the x86 is maintained. 


Pentium Pro is both superpipelined and superscalar 


As mentioned above, in the Pentium Pro all x86 instructions are converted into 
micro-ops with triadic formats before they are processed. This conversion allows an 
increase in the pipeline stages with little difficulty. Intel uses a 12-stage pipeline for the 
Pentium Pro. In contrast to the 5-pipestage Pentium, although each pipestage of the 12- 
pipestage Pentium Pro performs less work, there are more stages. This means that in the 
Pentium Pro, more instructions can be worked on and finished at a time. The Pentium Pro 
with its 12-stage pipeline is referred to as superpipelined. Since it also has multiple exe- 
cution units capable of working in parallel, it is also superscalar. Another advantage of 
the 12-pipestage concept is that it can achieve a higher clock rate (frequency) with the 


ee 
610 


given transistor technology. This is one reason that the earliest Pentium chips had a fre- 
quency of only 60 MHz while the earliest Pentium Pro has a frequency of 150 MHz. Intel 
also used what is called out-of-order execution to increase the performance of the Pentium 
Pro. This is explained next. 


What is out-of-order execution? 


In Pentium architecture, when one of the pipeline stages is stalled, the prior stages 
of fetch and decode are also stalled. In other words, the fetch stage stops fetching instruc- 
tions if the execution stage is stalled, due, for example, to a delay in memory access. This 
dependency of fetch and execution has to be resolved in order to increase CPU perform- 
ance. That is exactly what Intel has done with the Pentium Pro and is called decoupling 
the fetch and execution phases of the instructions. In the Pentium Pro, as x86 instructions 
are fetched from memory they are decoded (converted) into a series of micro-ops, or 
RISC-type instructions, and placed into a pool called the instruction pool. See 
Figure 23-9. This fetch/decode of the instructions is done in the same order as the program 
was coded by the programmer (or compiler). However, when the micro-ops are placed in 
the instruction pool they can be executed in any order as long as the data needed is avail- 
able. In other words, if there is no dependency, the instructions are executed out of order, 
not in the same order as the programmer coded them. In the case of the Pentium Pro, the 
dispatch/execute unit schedules the execution of micro-ops from the instruction poo! sub- 
ject to the availability of needed resources and stores the results temporarily. Such a spec- 
ulative execution can go 20-30 instructions deep into the program. It is the job of the 
retire unit to provide the results to the programmer's (visible) registers (e.g., EAX, EBX) 
according to the order in which the instructions were coded. Again, it is important to note 
that the instructions are fetched in the same order that they were coded, but executed out 
of order if there is no dependency, and ultimately retired in the same order as they were 
coded. This out-of-order execution can boost performance in many cases. Look at 
Example 23-6. 

Due to the fact that memory fetches (due to cache misses) can take many clock 
cycles and result in underutilization of the CPU, out-of-order execution is a way of find- 
ing something to do for the CPU. Simply put, the idea of out-of-order execution is to look 
deep into the stream of instructions and find the ones that can be executed ahead of oth- 
ers, providing that resources are available. Again, it is important to note that the Pentium 


Example 23-6 


For the following code, indicate the instructions that can be executed out of order in the Pentium 
Pro. 


(1) LOAD (R2), R4 ;LOAD R4 FROM MEMORY POINTED AT BY R2 
CADPR RARI *RI+R4=S>R7 
(Z)ADD R6,R8,R10 ;R6+R8-==>R10 
4) sUe RS RI RO JRS RIER 
(sy) AbD RO, RO REH RI? 


Solution: 


Instruction i2 cannot be executed until the data is brought in from memory (either cache or main 

memory DRAM). Therefore, i2 is dependent on il and must wait until the R4 register has the data. 
However, instructions i3, i4, and i5 can be executed out of order and parallel with each other since 
there is no dependency among them. After the execution of i2, all the instructions i2, 13, 14, and 15 
can be retired instantly since they all have been executed already. This would not be the case if these 
instructions were executed in the Pentium since its pipeline would be stalled due to the memory 

access for R4. In that case, instructions i3, i4, and i5 could not even be fetched let alone decoded 


and executed. 


D 
CHAPTER 23: PENTIUM AND RISC PROCESSORS 611 


Example 23-7 


The following x86 code (a) sets the pointer for three different arrays, and the counter value, (b) gets 
each element of ARRAY_1, adds a fixed value of 100 to it, and stores the result in ARRAY_2, and 
(c) complements the element and stores it in ARRAY 3. Analyze the execution of the code in light 
of the out-of-order execution and branch prediction capabilities of the Pentium Pro. 


(1) MOV EBX,ARRAY 1 ;LOAD POINTER 

(2) MOV ESI,ARRAY 2 ;LOAD POINTER 

(3) MOV EDI,ARRAY 3 ;LOAD POINTER 

(4) MOV ECX,COUNT ;LOAD COUNTER 

(5) AGAIN: MOV EAX,[ EBX] ;LOAD THE ELEMENT 
(6) ADD BAX, 100 ;ADD THE FIX VALUE 
(7) ADD EBX,4 ;UPDATE THE POINTER 
(8) MOV [ESI] , EAX PSTORESTHE RESULT 
(9) ADD ESI,4 ;UPDATE THE POINTER 
(10) NOT EAX ;COMPLEMENT THE RESULT 
(11) MOV [ EDI] , EAX ;AND STORE IT 

(12) ADD BSI, 4 ;UPDATE THE POINTER 
(13) LOOP AGAIN ;STAY IN THE LOOP 
(14) MOV AX,4CO0H ERT. 

(15) PNT Zik 


Solution: 


The fetch/decode unit fetches and converts instructions into micro-ops. Since there is no dependen- 
cy for instructions il through i5, they are dispatched, executed, and retired except for i5. Notice that 
the pointer values are immediate values; therefore, they are embedded into the instruction when the 
fetch/decode unit gets them. Now i5 is a memory fetch that can take many clocks, depending on 
whether the needed data is located in cache or main memory. Meanwhile i6, i8, i10, and il1 must 
wait until the data is available. However i7, i9, and i12 can be executed out of order knowing that 
the updated values of pointers EBX, EDI, and ESI are kept internally until the time comes when they 
will be committed to the visible registers by the retire unit. More importantly, the LOOP instruc- 
tion is predicted to go to the target address of AGAIN and i5, i6, ... are dispatched once more for the 
next iteration. This time the memory fetch will take very few clocks since in the previous data fetch, 
the CPU read at least 32 bytes of data using the Pentium Pro 64-bit (8 bytes) data bus and the burst 
read mode, transferring four sets of 8-byte data into the CPU. This process will go on until the last 
round of the LOOP instruction where ECX becomes zero and falls through. At this time, due to mis- 
prediction, all the micro-instructions belonging to instructions i5, i6, i7, ... (start of the loop) are 
removed and the whole pipeline restarts with instructions belonging to i14, i15, and so on. 


Pro will not immediately provide the results of out-of-order executions to programmer- 
visible registers such as EAX, EBX, and so on, since it must maintain the original order 
of the code. Instead, the results of out-of-order executions are stored in the pool and wait 
to be retired in the same order as they were coded. Therefore, programmer-visible regis- 
ters are updated in the same sequence as expected by the programmer. 


Branch prediction 


The Pentium Pro, like the Pentium before it, has branch prediction, but with 
greater capability. When the Pentium Pro encounters branch instructions (such as JN Z), it 
creates a list of them in what is called the branch target buffer (BTB). The BTB predicts 
the target of the branch and starts executing from there. When the branch is executed, the 


ee 
612 


result is compared with what the prediction section of the CPU said it would do. If they 
match, the branch is retired. If not, all instructions behind the branch are removed from 
the pool and the correct branch target address is provided to the BTB. From there the BTB 
refills the pipeline with instructions from the new target address. See Example 23-7. 
Note the following points concerning the reordering of store instructions from 
Intel documentation: "Stores are never performed speculatively since there is no transpar- 
ent way to undo them. Stores are also never re-ordered among themselves. A store is dis- 
patched only when both the address and the data are available and there are no older stores 
awaiting dispatch." 
Bus frequency vs. internal frequency in Pentium 


Frequently you may see an advertisement for a 1-GHz or 2-GHz Pentium PC. It 
is important to note that the stated frequency is the internal frequency of the CPU and not 
the bus frequency. This is due to the fact that designing a 1-GHz motherboard is very dif- 
ficult and expensive. Such a design requires a very fast logic family and memory in addi- 
tion to a massive simulation to avoid crosstalk and signal radiation. The bus frequency for 
such systems is currently less than 1 GHz. 


Review Questions 


1. Pentium Pro is the official name designated by Intel. What was it called before 
such a designation? 

2. True or false. Both the Pentium and Pentium Pro have 16KB L1 cache. 

3. True or false. Both the Pentium and Pentium Pro have L2 cache on the same pack- 
age. 

4. True or false. The x86 instruction set is in triadic form. 

5. Which of the x86 processors has out-of-order execution? 

6. Which unit inside the Pentium Pro commits the final results of operations to regis- 
ters EAX, EBX, and so on? 

7. True or false. The Pentium Pro is a superpipelined processor. 


SECTION 23.5: MMX TECHNOLOGY 


In this section, we discuss the MMX (MultiMedia extension) technology used in 
some of the Intel processors. 


DSP and multimedia 


To run high-quality multimedia applications with sound and graphics requires 
very fast and sophisticated mathematical operations. Such complex operations are normal- 
ly performed by a highly specialized chip called DSP (digital signal processing). DSP 
chips are the main engines performing tasks such as 2D and 3D graphics, video and audio 
compression, fax/modem, PC-based telephoning with live pictures, and image processing. 

There are three approaches to equip the PC with DSP capability. 


1. Use a full-fledged DSP chip on the board along with the main CPU. This is the best 
and ideal approach since there are some very powerful DSP chips out there. 
However, the problem is that there is no industry-wide standard to be followed by 
the PC designers and the lack of such a standard can lead to incompatibility both in 
hardware and software. 

2. Use the x86 and x87 FP (floating-point) instructions to emulate the function of 
DSP. This is slow and the performance is unacceptable. 

3. The third approach is to incorporate some DSP functions into the x86 microproces- 
sor. This approach leaves everyone at the mercy of Intel, yet it brings compatibility 
and a unified approach to the issue. Although the performance is not as good as 
with the first approach, it is much better than the second approach. 

The third approach is exactly what happened. In early 1997 Intel introduced a 
enn nee nee EEE EEE 


CHAPTER 23: PENTIUM AND RISC PROCESSORS 613 


series of Pentium and Pentium Pro chips with somewhat limited DSP capability called 
MMX technology. In the case of Intel's MMX technology, software compatibility, both on 
the BIOS and operating system levels, was the most important goal. It needs to be noted 
that although MMX does not have the rich set of instructions normally associated with 
DSP chips such as Texas Instruments' DSP chips, it still performs many of the DSP func- 
tions reasonably well. 


Register aliasing by MMX 


As stated earlier, one of the main goals of MMX technology was to maintain com- 
patibility with other x86 processors with no MMX capability. To assure that, Intel uses 
the FP (floating-point) register set of the x87 math coprocessor as the working register for 
MMX instructions instead of introducing a whole new set of registers. This is called reg- 
ister aliasing, meaning that the same physical register has different names. While the x87 
FP registers are 80 bits wide, the MMX uses only 64 bits of it. The x87 floating-point reg- 
isters are called ST(0), ST(1), ..., ST(7) when they are used by the x87 instruction set, but 
the same registers are called MM0, MM1, .... MM7 when used by the MMX portion of 
the CPU. See Figure 23-11. Register aliasing by MMxX has some major implications: 


1. We must not use the registers to store MMX data and FP (floating point) data at the 
same time since they are the same physical registers. 

2. We must not mix MMX instructions with FP instructions. Mixing MMX and PR 
instructions slows down the application since it takes many clock cycles to switch 
between MMX and x87 instructions. The best method is to have separate program 
modules for x87 instructions and MMX instructions with no intermixing. 

3. When leaving an MMX program module, make sure that all the MMX registers are 
cleared before issuing any x87 instructions. The same is true if switching from x87 
to MMX. Ati FP registers must be popped to leave them empty. 

4. As shown in Chapter 20, FP registers are accessed by the x87 instructions in the 
stack format. However, when these same registers are accessed by the MMX 
instruction set, each one is accessed directly by its name, MM0-MM7. These 
MMX registers cannot be used to address memory and must be used only to per- 
form calculations on data. 


Figure 23-11. MMX Register Set 


—————SSSeSeeSeSeSeeeeeeeSeSeeeeeeeeSSSSFSsS‘iFs 
614 


_—— bytes (8x8 bits) 


Packed words (4x16 bits) 


Packed doublewords (2x32 wai 


Quadwords (64 bits) 
63 


Figure 23-12. MMX Data Types 
Data types in MMX 


As mentioned earlier, the MMX uses only 64 bits of the 80-bit wide FP registers. 
Therefore, the largest MMX data size is 64-bit. However, the 64-bit register can be used 
for four different data types. See Figure 23-12. They are as follows. 


Quadword (one 64-bit) 

Packed doubleword (two 32-bit) 
Packed word (four 16-bit) 
Packed byte (eight 8-bit) 


= UES 


All four data types of the MMX are integers and are referred to as packed data. It 
must be noted that the contents of the MMX registers can be treated as any of the four dif- 
ferent types of eight bytes, four words, two doublewords, or one quadword. It is the job 
of the MMX instruction to specify the data type. For example, the instruction Packed Add 
has three different formats depending on the data type. They are as follows: 


PADDB (Add Packed Byte) adds two groups of 8 packed bytes 
PADDW (Add Packed Word) adds two groups of 4 packed words 
PADDD (Add Packed double ) adds two groups of 2 packed doublewords 


It is interesting to note that Intel has introduced to the x86 instruction set a total 
of 57 new instructions just for the MMX. Since currently there is no assembler or com- 
piler equipped with the MMX instruction set, we will not give any MMX programming 
example. To find out if a given Pentium is an MMX chip, we must use a Pentium instruc- 
tion called CPUID. This is discussed next. 


CPU identification for x86 


CPU identification is so important to the new generation of operating systems and 
software packages that starting with the Pentium, Intel has introduced a new instruction 
to do just that. The problem is how to identify microprocessors prior to the Pentium. 
According to Intel, to identify the microprocessor by way of software one must examine 
the bits of the flag register. Notice that in the 8088/86/286 the flag register is a 16-bit reg- 
ister but in the 386/486/Pentium it is a 32-bit register. Table 23-5 shows the status of the 
flag bits used in identifying the processor type. These bits can be examined at any time 
and not just at boot-up. 


LK 
CHAPTER 23: PENTIUM AND RISC PROCESSORS 615 


Table 23-5: Flag Bits for CPU Identification 


[CPU__—s[ Flag Bits 
8088/86 Bits 12 through 15 are always 1. 
80286 Bits 12 through 15 are always 0 (in real mode). 


CPU 
80386 Bit 18 is always 0 (in real and protected mode ). 


Bit 21 cannot be changed, therefore it is 486; if bit 21 can be changed to 1 and 0, then it 
80846 : 
must be a Pentium. 


Starting with the Pentium , one can use a new instruction , CPUID, to get information such 
as family and model of the processor. However, it is the ability to set or reset bit 21 of 
the flag that indicates whether the CPUID instruction is supported or not . The CPUID 
instruction can be executed any time in protected mode or real mode . 


For Intel’s Pentium and higher microprocessors , prior to execution of the CPUID 
instruction we must set EAX = 1. After the execution of CPUID , bits D8- D11 of EAX 
have the family number. The family number is 5 for the Pentium and is 6 for the 
Pentium Pro. 


It must also be noted that there is no instruction in the x86 family that can 
exchange the contents of the flag register and a general-purpose register directly. 
Therefore, to examine the contents of the flag register in an x86, we must use the stack as 
outlined in the following steps: 


Push the flag register onto the stack. 

Get (pop) it into a register such as AX, BX. 

Manipulate bits d15—d12 (or any other bits). 

Push it back onto the stack. 

Pop it back into the flag register from the stack. 

Push the flag back onto the stack again. 

Get (pop) the new flag bits back into a register. 

Examine bits d12-d15 to see if the changes in step 3 took effect. 


Orta S ae 


The following code shows the above steps. 


Step 1) PUSHF ;push the flag into stack 
Step 2) POP BX pand get it into BX 

Step 3) AND BX, OFFFH ;mask bits d15-d12 

Step 4) PUSH BX ;send it back into stack 

Step 5) POPF ; bring te aek into flag req 
Step 6) PUSHF ;store the flag on stack 

Step 7) POP” “BX get it into BX again to examine 
Step 8a) AND BX,0F000H j;mask all bits except dl2-d15 
Step 8b) CMP BX,.... 


Notice in the above code that instructions "PUSHE" and "POPF" are used for 
pushing and popping the 16-bit flag register. However, to access the 32 bits of the flag reg- 
ister in the 386 and Pentium processors, instructions PUSHFD and POPFD must be used. 
These steps are coded in Program 23-1. 


CPUID instruction and MMX technology 


Not all Pentium and Pentium Pro microprocessors come with MMX technology. 
To find out if a microprocessor is equipped with MMX technology, we can use Pentium 
instruction CPUID. According to Intel, upon return from instruction CEUD, ifD23«ef 
EDX is high, the CPU has MMX technology. MMX identification is performed in 
Program 23-1. 


eee 
616 


PeEegmean 23] 
faster Making sure CPUID instruction is supported 


MOV EAX 1 ; REQUEST FOR FEATURE FLAG 
CEUTD *CPUID INSTRUCTION 
TEST EDX, 00800000H *;BIT 23 OF EDX INDICATES MMX 


JNZ MMX YES 
oe ;NO MMX 
MMX YES: 


Note: To assemble the code below you need MASM 6.11 which sup- 
port the Pentium instructions such as CPUID (the directive .586 
is for that purpose). If you are using MASM 5.x then remove 

both the .586 directive and CPUID instruction and replace them 
with opcode for CPUID which is (OFA2H) in the following manner 


DW OFA2H ,epcode for CPUID instruction 


;this routine identifies the PC'S 80X86 microprocessor 

upon return from this subroutine, AX contains microprocessor 
code 

;where 0=8088/86, 1=286, 3=386,4=486, 5=Pentium, 6=Pentium Pro 


GET CPUID PROC 
Secme iceedee/S6 by checking bits, dl2-di5 of flag reg 


BRUSHE ;push the flag into stack 

FOR BX sond CfeNe aie aliahete) 1sp.< 

AND BX, OFFFH mask bits di5-da12 

PUSH BX ;send it back into stack 

PORE ;bring it back into flag reg 

PUSHF ;store the flag back on stack again 
POP BX and get it back into BX 


AND BX, OFOOOH mask all bits except dilZ-dil5 

CMP BX, OFOOOH Jare the dl2-d15 all zeros? 

MOV AX, 0 ;make AX=0 code for 8088/86 

JE OVER ;if yes then AX=0 code for 8088/86 
see if it is 80286 by checking bits of dl2-d15 of flag reg 

OR BX, OFOOOH AE mot try setting ClLA=elilis) to Progi 


PUSH BX AOS, 1t Into stack 

POPE make dl2-d15 of flag reg all ils 
PUSH? ;get the flag back into stack 
BOB BX ;get it back to examine the bits 


AND BX, OFOOOH ;mask all bits except d12-d15 
CMP BX, OFOOOH pare d1l2-d15 all 1s 


MOV AX, 1 jmake AX=1 code for 286 

JE OVER ;if yes set AX=1 code for 286 
BSGE ait Lic SS Ste) Joyy Clucehcwaxe; BIES of flag loalic 

«3816 

PUSHFD AOE SUE aS SEG) epg. Nigher. push aag 

POR EBX zand get it into EBX 

MOV EDX, EBX ¿save it 

XOR EBX~40000H) “pflip bit 18 

PUSH EBX Scum TERIMEO Sitack 

PORED get it anto flag 

PUSHFD Joget it back into stack 

POP EBX ;get the new flag back into EBX 


POSES a a aaaacaacaaeaaaaaaacaaaaa 
CHAPTER 23: PENTIUM AND RISC PROCESSORS 617 


MOV AX, 3 ;make AX=3 code for 386 

XOR EBX, EDX ;see if bit 18 is toggled 

JE OVER ¿if yes then Ax=3 CODE eos 977 
;see if it is 486 or higher. try changi ngeset. 21 of (Fea ere, 

MOV AX,4;if not it is 486 or higher (AxX=4 for 486) 


PUSHFD ¿see if bit 21(ID bit) can be altered 
POP EBX zin order to use tho CIWIND) LinSeenctuom 
MOV EDX, EBX Save original flag bit in EDX 
XOR Ms, AOWOOONS! lo aie 2i 
PUSH EBX ;save it on the stack 
POPFD Poise ale alice) lae eG; 
PUSHFD get flag back into stack 
POP EBX ;get it into EBX to examine bit 21 
XOR EBX, EDX ;see if bit 21 changes 
JE OVER ;if yes AX=04 code for 486 

see which Pentium (586,o0r 686) by using CPUID anstruction 
MOV EAX,1 ;set EAX=1 before executing CPUID- 
-586 ;use Pentium instruction 
CPUID 

rafter execution of CPUID, bits D8-D11 of EAX have family.number 
. 386 lovee jero) 3830 Mer rUceronsS 


AND EAX, 0OF00 ;mask all bits except the family bit 
SHR EAX,8 ;move d8-d11 to lower nibble then AX=5 
;for Pentium, 6 for Pentium Pro 


.8086 
OVER: RET ;return with AX=processor number 
CETRCEUTD ENDP 


For the latest CPU identification program, see http://www.!ntel.com. 


Review Questions 


— 


Why do we not use the FP x87 instructions for DSP multimedia? 
MMX is available for which of the x86 processors? 

(a) 486 (b) Pentium (c) Pentium Pro 

True or false. MMX aliases the x86 registers. 

What are the names of the MMX registers? 

True or false. MMX instructions access registers in stack format. 
How can a program determine if the processor is a 386? 

To identify the 486 processor, toggle i 

CPUID instruction was first introduced with the processor 
To identify the Pentium, use CPUID with EAX = : 

0. Can the CPUID instruction help determine if the chip supports MMX technology? 
If so, how? 


~ 


See Se 


PROBLEMS 


SECTION 23.1: THE 80486 MICROPROCESSOR 


1. The 486 chip uses pins. 
2. The 486 is a(n) -bit microprocessor. 


ees 
618 


Lie 
18. 


Eo. 
20. 
21% 


22, 
28. 
24. 


What is the size of on-chip cache in the 486? 

Off-chip cache is referred to as cache. 

True or false. On-chip cache in the 486 is used to hold both data and code. 
State the differences between the 486 and 486SX microprocessors. 


The 486 has a(n) -bit external data bus. 
How many data parity pins does the 486 have? 
The 486 can access bytes of memory using the address pins. 


How many BE pins does the 486 have? 


. Nonburst read and write cycles take clocks. 

. What does "2-2-2-2" cycle mean? 

. What does "2-1-1-1" cycle mean? 

. The 486 fetches bytes of code and data into CPU using a burst cycle. 


How many clocks does the burst cycle take? 


. Calculate the bus bandwidth of a 486 with 25 MHz in each of the following. 


(a) nonburst cycle (b) burst cycle 
Calculate the bus bandwidth of a 486 with 33 MHz in each of the following. 
(a) nonburst cycle (b) burst cycle 
Which 486 instruction converts from the little endian to big endian, or vice versa? 
Show how the data is placed in memory for the following program before and after 
the execution of BSWAP. 
MOV EAX,23F46512H 
MOV [4000],EAX 
BSWAP EAX 
MOV [6000],EAX 
If a 486 is advertised as 25 MHz, what clock frequency is connected to CLK? 
What is the purpose of the A20M pin in the 486? 
Assume that CS = FFFFH and IP = 76A0H. Calculate the physical address of the 
instruction in each of the following states. 
(a) A20M = 0 (b) A20M = I 
What is HMA, and how does it relate to the A20M pin in the 486? 
The 486 uses a pipeline of stages. 
Give the names of the pipeline stages in the 486. 


SECTION 23.2: INTEL'S PENTIUM 


25. 


26. 
20. 
28. 
297 
30. 
31E 
32 
35: 
34. 
35. 


36. 
37. 
38. 


The number of pipeline stages in a superpipeline system is (less, more) 
than in a superscalar system. 
Which has one or more execution units, superpipeline or superscalar? 


The Pentium uses transistors and has pins. 

The Pentium has a(n) -bit external data bus whose pins are named 
The Pentium is a(n) -bit microprocessor. 

State how many BE pins the Pentium has and their purpose. 

BE pins are active (low, high). 

If BE7—BEO = 11110000, which part of the data buses is activated? 


If BE7—BEO = 00000000, which part of the data buses is activated? 

How many DP (data parity) bits does the Pentium have? 

Find and compare the Pentium bus bandwidth for the following 60-MHz systems. 
(a) nonburst mode (b) burst cycle mode 

What is the size of on-chip cache in the Pentium? 

In Problem 36, how much cache is for data and how much for code? 

Which part of on-chip cache in the Pentium is write protected, data or code? 


eee een nn LK 
CHAPTER 23: PENTIUM AND RISC PROCESSORS 619 


39. True or false. The Pentium has an on-chip math coprocessor. 

40 What does instruction pairing mean in the Pentium? 

41. The Pentium uses (superscalar, superpipeline) architecture. 

42. What is instruction pairing, and when can it happen? 

43. What is data dependency, and how is it avoided? 

44. Write a program for the 386/486/Pentium to calculate the total sum of 10 double- 
word operands. Use looping. 

45. Compare the clock count of the loop in Problem 44 for each of the following. Use 
branch prediction for (c) and (d). Note that if two instructions are paired and one 
takes 2 clocks and the other takes only | clock, the clock count is 2. 

(a)386 (b)486 (c)a Pentium with instruction pairing but no code scheduling 
(d) a Pentium with instruction pairing and code scheduling 

46. Calculate the bus bandwidth for a 486DX2-50 and a 486DX4-100. Note that the 
486DX2-50 is 25 MHz and the 486DX4-100 is 33 MHz. 

47. Draw the pipeline stages for the pairing of instructions in the Pentium. 

48. True or false. The Pentium has the A20M pin. 


SECTION 23.3: RISC ARCHITECTURE 


49. Why is RISC called load/store architecture? 

50. In RISC, all instructions are _-byte. 

51. Which of the following instructions do not exist in RISC? 
(a) ADD reg,reg (b) MOV r,immediate (c) OR reg,zmem 

52. State the steps in a RISC program to add a register to a memory location. 

53. What is the advantage of having all the instructions the same size? 

54. Why are RISC programs larger than CISC programs? 

55. The vast majority of RISC instructions are executed in (1, 2, 3) clocks. 

56. What is Harvard architecture? Is it unique to RISC? Can a CISC system use 
Harvard architecture? 

57. Code a RISC program to add 10 operands of 4-byte (doubleword) size and save the 
result. Do not be concerned with carries. 

58. Show the code scheduling and clock count for Problem 57. 

59. What is a delayed branch? 

60. MS-DOS runs native on which of the following processors? 


(a) IBM/Freescale Power PC RISC (b) Intel x86 
(c) ARM RISC 
61. Generally, RISC processors have registers, each 32 bits wide. 


62. Which register in RISC always has the value zero in it, no matter what operation is 
performed on it? 
63. Discuss the terms porting, emulating, and running native. 


SECTION 23.4: PENTIUM PRO PROCESSOR 


64. The Pentium Prois __-bit internally and __-bit externally. 

65. The Pentium Pro has address bits. 

66. The Pentium Pro is capable of addressing bytes of memory. 
67. The Pentium Pro has pins for data bus. 


68. A Pentium Pro is advertised as 200 MHz. Is this an internal CPU frequency or a 
bus frequency? 

69. What is the difference between the L2 cache of Pentium and Pentium Pro systems? 

70. Do the 5.5 million transistors used for the Pentium Pro include the L2 cache tran- 


—_—_—e—eeoooooo 


sistor count? 
71. Which of the x86 processors has out-of-order execution? 
72. Are the triadic registers of the Pentium Pro visible to the programmer? 
73. True or false. Instructions are fetched according to the order in which they were 


written. 

74. True or false. Instructions are executed according to the order in which they were 
written. 

75. True or false. Instructions are retired according to the order in which they were 
written. 


76. The visible registers EAX, EBX, and so on, are updated by which unit of the CPU? 
77. True or false. Among the instructions, STOREs are never executed out of order. 
78. Which of the x86 processors have branch prediction capability? 


SECTION 23.5: MMX TECHNOLOGY 


79. True or false. The MMX uses x86 and x87 instructions to emulate DSP functions. 

80. A given system has both a general-purpose CPU (such as the x86) and a DSP chip. 
Discuss the role of each chip. 

81. Does the 486 system have MMX technology? 

82. Explain the concept of register aliasing. 

83. Which registers are aliased in MMX technology? 

84. Explain the difference between the way x87 registers are accessed and the way 
MMX accesses the same registers. 

85. Indicate which group of instructions can be intermixed. 
(a) x86, x87 (b) x86, MMX (c)x87, MMX 

86. When leaving MMX, what is the last thing that a programmer should do? 

87. When leaving x87, what is the last thing that a programmer should do? 

88. How many bits of the x87 register are used by MMX? 


89. A quadword has bits. 
90. Give other data formats besides quadword that can be viewed by MMX instruc- 
tions. 


91. True or false. Every x86 supports the CPUID instruction. 

92. Explain how to identify the 386 microprocessor. 

93. Explain how to identify the 486 microprocessor. 

94. Explain how to identify the Pentium microprocessor. 

95. Explain how to identify the Pentium Pro microprocessor. 

96. Explain how to identify the Pentium Pro with MMX technology. 

97. How can the CPUID instruction be coded into a program if the assembler does not 
support it? 


ANSWERS TO REVIEW QUESTIONS 


SECTION 23.1: THE 80486 MICROPROCESSOR 


1. 168 pins in PGA 
2. True 

3. 32,22 

4. 


The 80486 has the math coprocessor on-chip; the 80486SX has the math coproces- 
sor 80487SX (a separate chip). 

5. Primary cache, secondary cache or L1 and L2 cache 

6. 8K bytes, 2-way set associative 
a 


CHAPTER 23: PENTIUM AND RISC PROCESSORS 621 


7. 1/50 MHz = 20 ns is the clock cycle. In burst cycle 5 clocks of 20 ns can access 16 
bytes of memory (4 memory cycles x 4 bytes of 32-bit data bus); therefore, (1/100 
ns) x 16 bytes = 160 megabytes/second 
Another way would be average clock per 32-bit access is (5 x 20)/4 = 25 ns and 
(1/25 ns) x 4 bytes = 160 megabytes/second 

8. 33 MHz 

9. Input 

10. Output, like all other address bits in x86 processors 


SECTION 23.2: INTEL'S PENTIUM 


273 

64 

ie 

16K bytes: 8K for code and 8K for data 
Code cache 

True 

True 

Since it has two execution units (pipelines) capable of executing two instructions 
with one clock 

9. False; by the compiler 

10. True 


oe St 


SECTION 23.3: RISC ARCHITECTURE 


Reduced instruction set computer, complex instruction set computer 
True 

5232 

True 

They are all 4 bytes. 

True 

c 

32, 8 stack based 

To take care of double-precision floating-point operands and single-precision 
operands 

10. False. It uses separate buses for data and code. 


Sa e] GA a S ae 


SECTION 23.4: PENTIUM PRO PROCESSOR 


P6 

True 

False 

False 

So far, the Pentium Pro 
Retire unit 

True 


ALDA ae o 


SECTION 23.5: MMX TECHNOLOGY 


1. The x87 instruction set is mainly for math functions such as sine, cosine, log, and 
so on and does not lend itself to DSP-type operations. 


ee 
622 


(b) & (c) 

False 

MA0—-MA7 

False 

Bit 18 of flag register is always high. 

Bit 21 

Pentium 

EAX=1 

0. Yes. After return from CPUID we test bit 23 of the EDX register. If it is high, the 
chip has MMX technology. 


OWN DAA RWN 


ll EE aca 


CHAPTER 23: PENTIUM AND RISC PROCESSORS 623 


624 


CHAPTER 24 


THE EVOLUTION OF x86: 


FROM 32-BIT TO 64-BIT 


OBJECTIVES 


Upon completion of this chapter, you will be able to: 


>> 
>> 
>> 
>> 
>> 
>> 


Compare and contrast the variations of Pentium processors 
Compare and contrast the variations of x86 32-bit architecture 
Describe the enhancements of 32-bit architecture in x86 processors 
Describe the role of L1-L3 caches in x86 processors 

Describe the role of multicore and multithreading feaures of x86 
Describe 64-bit architecture of x86 


625 


The x86 microprocessor architecture has gone through major changes since the 
introduction of the first Pentium processor in 1993. This chapter examines some of these 
changes. In Section 24.1, we examine the evolution of the Pentium family from Pentium 
II to Pentium IV. The 64-bit architecture of the x86 and Itanium processors and their 
impact on operating systems such as Vista are discussed in Section 24.2. Although many 
new features of the x86 are present in both Intel and AMD products, for the sake of con- 
venience, we refer only to the Intel chips. 


SECTION 24.1: x86 PENTIUM EVOLUTION 


In the last chapter we examined enhancements introduced in the Pentium and 
Pentium Pro processors. In this section, we continue to explore the evolution of the 
Pentium from Pentium II to Pentium IV. We will discuss features such as cache, multi- 
treading, and multicore capabilities introduced in later-generation x86 processors. 


From Pentium Il to Pentium IV: an overview 


To increase the processing power of the Pentium series, Intel added new enhance- 
ments in each of the successive generations of Pentium II, III, and IV. These enhance- 
ments mostly involved the internal architecture and did not change the register size of the 
original 386, which was 32-bit. Next, we examine the enhancements of the 32-bit archi- 
tecture for x86 CPUs introduced in recent years. 


Level 1 and Level 2 caches 


In the Pentium II, the concept of level 2 (L2) cache was introduced. In Pentium 
Pro, we had 8K bytes of cache for code (instruction) and another 8K bytes of cache for 
data, feeding code and data to the fetch unit. This is called level 1 (L1) cache. See Figure 
24-1. 

In the Pentium II the size of L1 cache was increased to 16K bytes each for code 
and data. It also added 256K bytes of L2 cache. While L1 cache feeds code (instruction) 
and data into the fetch and execution units and is part of the inner working of the CPU, 
the Level 2 cache is sitting outside the CPU die but still on the same package as the CPU 
itself. Since the L1 cache is on the same die as the CPU, it works at the same clock speed 
as the CPU. For example, if a given Pentium has clock speed of 2 GHz, then the L1 cache 
feeds the CPU information at that speed. L2 cache works at a fraction of the CPU speed. 
For example, if a given Pentium has clock speed of 2 GHz, then the L2 cache feeds the 
CPU information at 1 GHz. When a Pentium with L2 cache brings in code and data from 
externally located DRAM memory, it places them in L2 cache. Then the memory man- 
agement unit of the core CPU brings in the information from the L2 cache and separates 
the code and data, placing each in data or code L1 caches (Harvard architecture). See 


CPU Package 


CPU Core ADDRESS BUS 


L1 Cache L3 


| Cache | 
( 


L2 Cache 


Figure 24-1. L1, L2, and L3 Cache 


626 


Figure 24-2. Notice that code and data caches are separate, which is not the case with the 
L2 cache. L2 cache is unified cache meaning that the cache is used for both code and data. 
The amount allocated to data and code varies dynamically, depending on the nature of the 
program being run. If the program being run is more data intensive, then more of the L 2 
cache is allocated to data. Table 24-1 shows the amount of L1 and L2 cache memories 
in the x86 processors and their speed at the time of introduction. With CPU speed rising 
above | GHz, the biggest problem is external (that is, external to the CPU chip) memory 
access time. For that reason, in some high-performance systems the designers place level 
3 (L3) cache outside the CPU on the motherboard to speed up the external memory access. 
This L3 cache is sitting between the CPU chip and DRAM memory module on the moth- 
erboard. See Figures 24-1 and 24-2. 


Branch 
Prediction 


Retirement Execution Fetch/ 
Decode 


L1 Data L1 Instr. 
Cache Cache 
Harvard 


Architecture 


L2 _ 
Bus Interface Unit 


von Neumann 
Architecture —__ System Bus 


Figure 24-2. L2 Cache Feeding Code and Data to L1 Caches 


Table 24-1: Pentium Cache Size and Speed 


Product Year Transistors CLK Speed Li Cache Li Cache L2 Cache 


Code ata 

386 1985 275K 20 MHz 0 0 0 
486 1989 1.2M 25 MHz 4 KB 4 KB 0 
Pentium 1993 3.1M 60 MHz 8 KB 8 KB 0 
Pentium Pro 1995 5.5M 200 MHz 8 KB 8 KB 0 
Pentium II 1997 7M 266 MHz 16 KB 16 KB 256 KB 
Pentium IH 1999 8.2M 700 MHz 16 KB 16 KB 512 KB 
Pentium IV 2000 42M 1.5 GHz 12 KB 8 KB 256 KB 
Pentium M 2004 140M 2 GHz 32 KB 32 KB 2 MB 
Pentium IV 2004 125M 3.4 GHz 12 KB 16 KB 1 MB 
Pentium Duo 2006 152M 2.16 GHz 32 KB 32 KB 2 MB 


nnn eet EE Sasa 


Note: The clock speed is the speed at the time of introduction. 


enn SEL 
CHAPTER 24: THE EVOLUTION OF x86: FROM 32-BIT TO 64-BIT 627 


In some x86 processors such as the Itanium with multiple cores there is L2 cache 
on the same package as the CPU, but located outside the CPU die. In such a processor one 
can summarize the role of LI—L3 caches as follows: 

(1) The speed of the L1 cache is a multiple of the CPU speed, since the CPU can 
retire multiple instructions per clock cycle. 

(2) L2 cache works at the same speed as the CPU since it is on the same package 
as the CPU. 

(3) L3 cache works at a fraction of the speed of the CPU since it is located out- 
side the CPU package. 


Hyper-Threading Technology (HTT) 


In the Pentium IV, the concept of multithreaded execution was introduced. First, 
the definition of thread: It is a series of parallel programs that can run on different CPUs 
simultaneously. In the multiprocessor environment, each program is given its own CPU 
and memory. Intel placed multiple CPUs into a single chip and called it hyper-threading. 
Therefore, hyper-threading in its simplest form is to allow a single CPU to execute two or 
more threads of code simultaneously. Of course, to do that the CPU must be equipped with 
internal logic and resources to execute the threads. Many of the early Pentium CPUs were 
not equipped with hyper-threaded technology, since it requires large amounts of transis- 
tors to duplicate many of the resources inside the CPU. As far as the operating system is 
concerned, the CPU with hyper-threaded capability apears to be multiple logical CPUs 
inside a single physical CPU. Therefore, to take full advantage of hyper-threading tech- 
nology, both the operating system and the application must be rewritten (or reconfigured) 
to make them threaded-aware. For that reason, the latest generation of Microsoft XP and 
Vista operating systems, along with many widely used applications, are using the hyper- 
threading features of the x86 processors to speed up program execution. The ideal situa- 
tion in the multithreaded environment is to write the application programs so that threads 
can execute independently of each other. However, that is not the case in the real world. 
Since both logical processors inside the hyper-threaded CPU use the same bus to access- 
es memory, they can get in each other’s way and slow down program execution. Figure 
24-3 shows the system bus access for the threaded CPU and multiprocessing. Note that in 
threaded CPUs, internal logical CPUs must share the system bus access. This is in con- 
trast to using multiple processors in which each CPU has its own access to the system bus. 
See Figure 24-4. 


CPU Package 
ADDRESS BUS 


| Drive 
Eos ae | eee 


DATA BUS 


Thread N 


ADDRESS BUS 


| | R Hard 
CPU 1 a 
Aa 


DATA BUS 


Figure 24-3. Multithreading vs. Multiprocessing 


628 


IA-32 Processor Supporting 
Hyper-Threading Technology Traditional Multiple Processor (MP) System 


Processor Core Processor Core Processor Core 


IA-32 Processor IA-32 Processor IA-32 Processor 


Two logical 
processors that share 
a single core 


Each Processor is a 
separate physical 
package 


AS = |A-32 Architectural State 


Figure 24-4. Intel Multithreading (Courtesy of Intel Corp.) 


In x86 literature the words threads and tasks are used interchangeably. However, 
there is a difference between a task and a thread. In multitasking you are running multi- 
ple tasks such as playing music, typing into a word processor, and running a virus scan all 
on a single CPU. In multitasking the CPU switches from one task to another in a round 
robin (circular) fashion, giving each task a slice of the CPU’s time. In contrast, true mul- 
tithreading attempts to parallelize the execution of a single program in order to speed up 
the execution of that program. Not all applications lend themselves to parallelization and 
that is the reason that not all programs benefit equally from multithreaded CPUs. Much 
of the multithreaded feature of the x86 comes from its multicore capability, which is dis- 
cussed next. For more discussion of multithreading and multitasking, see the following 
article: 


http://arstechnica.com/articles/paedia/cpu/hyperthreading.ars 


Multicore Technology 


Many newer-generation Pentiums have what is called multicore technology. 
Multicore packs two or more independent microprocessors (called cores) into a single 
chip. At this time, many x86 CPUs come with dual-core and quad-core features. Both Intel 
and AMD are working on processors with 8 cores. In the dual-core CPU, almost ever- 
thing is doubled, which is like putting two physical CPUs into a single chip. The differ- 
ence between multicore and multiprocessor CPUs is that in the multicore CPU there is one 
pathway to the system memory for the CPU while in the multiprocessor CPUs each 
processor has its own memory space independent of the others. See Figure 24-5. 

See the following for more on multicore CPUs: 


http://en.wikipedia.org/wiki/Multicore_(computing) 


CHAPTER 24: THE EVOLUTION OF x86: FROM 32-BIT TO 64-BIT 629 


Branch Branch 


Prediction Prediction 


Fetch/ 3 j 
i i Retirement Execution 


L1 Instr. L1 Data L1 Instr. 
Cache Cache Cache 


L2 Cache 


Bus Interface Unit 


System Bus 


Figure 24-5. Dual-Core Processor from Intel 


Streaming SIMD Extension (SSE) 


The concept of single instruction multiple data (SIMD) was first incorporated into 
the Pentium Pro with the introduction of MMX technology, as we saw in the last chapter. 
The original MMX used SIMD operations on packed bytes of data. Intel expanded the 
SIMD to include operations on floating-point data and called it streaming SIMD extension 
(SSE). The SSE was expanded further to include both single and double double-precision 
floating points, which are referred to as SSE, SSE2, SSE3, and SSSE3 in the Intel litera- 
ture. While the original SIMD had one set of MMX registers, the extended SIMD has sev- 
eral sets of MMX registers in addition to a new set of 128-bit registers called XMM. See 
Figure 24-6. 


Other features of Pentium 


The Pentium increased the number of pipeline stages to 10 and eventually to 20 
to make it a super superpipeline CPU. It also expanded the instruction decoder units to 
four with a hugely expanded size of instruction queue. These improvements, along with 
greatly expanded out-of-order executions and branch prediction capability, allowed the 
CPU to retire up to four instructions per clock cycle. The combination of the above 
enhancements was still insufficient to keep the high-speed system designers happy. That 
left x86 CPU companies such as Intel and AMD no choice but to expand the size of gen- 
eral-purpose registers to 64-bit. The change to 64-bit architecture came about only when 
increase in performance of the 32-bit architecture was no longer possible. For this reason, 
we see two distinctive architectures for the x86 being discussed in recent years: 32-bit and 
64-bit. The performance enhancements of 64-bit architecture for the x86 processors are 
discussed in the next section. 


Review Questions 


True or false. All Penttums have 16 KB L1 cache. 

True or false. All Pentiums have L2 cache on the die. 

True or false. All Pentiums have the hyper-threading feature. 

What is the difference between multicore and multiprocessor? 

True or false. The Pentium IV has a much deeper pipeline than the Pentium I. 


eee 
630 


AFA eo 


SIMD Extension Register Layout Data Type 


MMX Registers 


MMX Technology [2° an 8 Packed Byte Integers 
ma T 4 Packed Word Integers 
as SC 2 Packed DoubleWord Integers 

MMX Registers 

SSE cleans 8 Packed Byte Integers 
a | 4 Packed Word Integers 
pina. . i 2 Packed DoubleWord Integers 


XMM Registers 


4 Packed Single-Precision 
Floating-Point Values 


MMX Registers 


SSE2/SSE3/SSSE3 ‘mane! Ma] 2 Packed Doubleword Integers 


XMM Registers 


Floating-Point Values 
COTATA TET) tePackes Byte integers 
CTT I I T TT} a Packes word integers 
Integers 
T E 


Figure 24-6. SIMD Extension in x86 (Courtesy of Intel Corp.) 


-e 
CHAPTER 24: THE EVOLUTION OF x86: FROM 32-BIT TO 64-BIT 631 


SECTION 24.2: 64-BIT PROCESSORS AND VISTA FOR x86 


The original 8086/88 was a 16-bit processor. Although the 80286 did improve 
performance in many aspects, it was still a 16-bit chip. Since IBM used both the 8088 and 
80286 in the original PCs, the DOS operating system and early application software were 
based on 16-bit architecture. The 80386 was the first 32-bit processor in the x86 family 
introduced in 1989. It was not until the late 1990s that Microsoft took advantage of the 
32-bit architecture of the 386 and introduced Windows NT. Subsequent improvements 
introduced into the original 386 in the form of Pentium I-IV family members allowed 
Microsoft to bring out more robust operating systems such as Windows 2000 and XP. It 
took many years and a massive effort to convert 16-bit DOS applications to the Windows 
NT/2K/XP 32-bit architecture. With the introduction of 64-bit x86 processors, the same 
efforts are needed again to get applications ready for the 64-bit Microsoft Vista. In this 
section, we discuss the x86 64-bit processors and their impact on Vista. 


64-bit architecture in x86 


The design of the 64-bit x86 microprocessor needed to maintain compatibility 
with the earlier generations of 386 legacy software. This was the same problem that 386 
designers faced a decade earlier when they moved away from the 80286. In the case of 
the 386, the 16-bit registers of AX, BX, and so on from the 8086/286 were extended to 
32-bit, and were called EAX, EBX, and so on. In this way, the 8086/286 registers became 
a subset of 386 registers. See Figure 24-7. To make the transition from 32-bit to 64-bit 
data types easier, the 64-bit architecture can work in two modes: (1) compatibility mode, 
and (2) 64-bit mode. In compatibility mode, most 16-bit and 32-bit legacy application 
software can be run without recompilation under 64-bit operating systems such as 
Microsoft Vista. Compatibility mode uses the 8-bit, 16-bit, and 32-bit data sizes and lim- 
its the memory space to 4G bytes, just like 386 architecture. In 64-bit mode, the 386 reg- 
isters AEX, EBX, and so on are expanded to 64-bit, and are designated as RAX, RBX, and 
so on. It also added another 8 new 64-bit general purpose registers called R8—R15. That 
gives us a total of 16 64-bit registers. The new R8-R15 registers can also be accessed as 
8-bit, 16-bit, and 32-bit registers. See Figure 24-8. 


External memory space of 64-bit architecture 


The 32-bit architecture of the Pentium limited the external memory space to 4G 
bytes. It also used segment registers to convert the linear addresses to physical addresses, 
as we saw in previous chapters. The 64-bit architecture stopped using segment registers 
for the most part, and introduced a flat memory space of 264 bytes. The EIP (instruction 
pointer) of the 386 has been expanded from 32-bit to 64-bit, and is called RIP. It allows 
access of up to 264 bytes of code and data stored in external memory using address 
A0-A63 pins. There is also a new 64-bit data pointer register for accessing externally 
located data memory using the AO—A63 address pins. However, to save pins not all fam- 
ily members have all the 64 address pins. In many cases, the 64-bit x86 processors have 
only 36 pins (A0—A35), which allow access of up to 64 G bytes of external memory. In 
some products, we also see 40 pins (A0—A39) for external memory connections, giving us 
a maximum of 2% bytes of memory space. The expansion of the instruction pointer from 
32-bit to 64-bit will assure sufficient memory space for the forseeable future. This espe- 
cially will help in the implementation of multiprocessor systems using a large number of 
processors, giving each one its own memory space of 4 to 64 gigabytes. In 32-bit archi- 
tectures, register sizes match the 32-bit external data bus. In CPUs such as the Pentium 
Pro, the external data bus was expanded to 64-bit, even though the registers were left at 
32-bit. The reason for having more external data pins than the register size is to increase 
the bus bandwidth and therefore bring in more code and data to keep the CPU busy. In 
the case of 64-bit registers, the external data bus is also 64-bit. To increase the bus band- 
width of the 64-bit x86 processors, it is very likely we will see 128 pins for the data bus 
in the near future. 


eee 
632 


= Address Space 
ECX pe 
EDX 

ESI j 

a General-Purpose Registers 

epp (Eight 32-bit Registers) 

ESP 

cs 

DS 

ss Segment Registers 

ES (Six 16-bit Registers) 

FS 

GS 


32 bits EFLAGS Register 
32 bits EIP (Instruction Point Register ) 


FPU Registers 


Eight 80-bit Floating-Point 
Registers Data Registers 


Control Register 


Status Register 
16 bits Tag Register 


d Opcode Register (11 bits) 
48 bits FPU Instruction Pointer Register 
48 bits FPU Data (Operand) Pointer Register 


MMX Registers 


Eight 64-bit 
Registers 


XMM Registers 
Eight 128-bit 


Registers 


32 bits MXCSR Register 


Figure 24-7. Intel Architecture - 32 (IA-32) Registers 


eee ee ee 
CHAPTER 24: THE EVOLUTION OF x86: FROM 32-BIT TO 64-BIT 633 


RAX Address Space 
RBX 64 

RCX 2 

RDX 

RSI 

RDI 

RBP 


RSP General-Purpose Registers 
R8 (Sixteen 64-bit Registers) 
R9 


R10 
R11 
R12 
R13 
R14 
R15 


cs 

DS 

SS Segment Registers 
a (Six 16-bit Registers) 


GS 


64 bits RFLAGS Register 
64 bits RIP (Instruction Point Register ) 


FPU Registers 0 


Eight 80-bit Floating-Point 
Registers Data Registers 


Control Register 

Status Register 

Tag Register 

Opcode Register (11 bits) 

FPU Instruction Pointer Register 


FPU Data (Operand) Pointer Register 


MMX Regi 


Eight 64-bit 
Registers 


XMM Registers 


Sixteen 128-bit 
Registers 


32 bits MXCSR Register 


Figure 24-8. 64-Bit Registers in 64-Bit x86 Processors 


634 


x86 64-bit processors and Vista 


Both Intel and AMD have been producing 64-bit x86 processors since the early 
2000s. Microsoft used some aspects of the 64-bit architecture in Windows XP and called 
it XP x64. It was only after the introduction of Vista that Microsoft took full advantage of 
the 64-bit architecture. But due to legacy software and the need for the transition to the 
new 64-bit systems, there are two versions of Vista: Vista and Vista x64. While Vista will 
run on any x86 32-bit architecture, Vista x64 requires a 64-bit x86 processor and at least 
1G byte of DRAM on the motherboard. In Vista-32 and XP, the operating system divided 
4GB of physical memory space into two sections, and allocated one half for user applica- 
tions and the other half for the kernel, drivers, video, and other needs of the OS. Vista x64 
raised this memory from 4G bytes to 16 terabytes (TB) (24^), and uses 8 terabytes for 
itself and the other 8 terabytes for user applications. Of course, it is not practical to have 
8 TB of DRAM on the motherboard at this time. For that reason, Vista x64 limits the 
memory to 128 GB DRAM for a single processor. 


Moore’s Law 
In the mid 1960s, Intel cofounder Gordon Moore made the following astounding 
prediction: “The number of transistors that would be incorporated on a silicon die would 


double every 18 months for the next several years.” Examining Tables 24-1 and 24-2 
shows how this prediction has come true. 


Table 24-2: Some 64-Bit Processors in x86 Famil 


Product Year ___Transistors__ Maximum Ext. Memory _ 
Xeon a ka 2004 a TSM S 64 GB 
Xeon Processor MP 2005 675 M 1024 GB (1 TB 
Pentium 4 2005 164 M 64 GB 
Pentium 840 2005 230 M 64 GB 
Dual-Core Xeon 2005 321 M 64 GB 
Pentium 4 672 2005 164 M 64 GB 
Pentium 955 2006 376 M 64 GB 
Core 2 X6800 2006 291 M 64 GB 
Xeon 5160 2006 291 M 64 GB 
Xeon 7140 2006 1.3 B 64 GB 
Core 2 QX6700 2006 582 M 64 GB 
uad-Core Xeon 5355 2006 582 M 64 GB 


Note: For the registers associated with these 64-bit processors see Figure 24-6. 


Review Questions 


1. Name the two modes of operation in 64-bit x86 processors. 

2. True or false. All 64-bit x86 processors have L3 cache. 

3. True or false. Both the compatibility mode and 64-bit mode can use 64-bit registers. 
4. The size of external memory in compatibility mode is limited to bytes. 

5. How many 64-bit regsiters do we have in 64-bit x86 processors? 

PROBLEMS 


SECTION 24.1: x86 PENTIUM EVOLUTION 


True or false. L1 caches for code and data are separate. 

True or false. Code and data use the same L2 cache. 

What is the size of L1 cache in the Pentium HI? 

What is the size of L2 cache in the Pentium IIT? 

What is the size of L1 cache in the Pentium IV (2006 version)? 
What is the size of L2 cache in the Pentium IV (2006 version)? 


AAWRWN = 


a 
CHAPTER 24: THE EVOLUTION OF x86: FROM 32-BIT TO 64-BIT 635 


7. What is the size of L1 cache in the Pentium Duo (2006 version)? 

8. What is the size of L2 cache in the Pentium Duo (2006 version)? 

9. What are differences between L1 and L2 caches? 

10. What is L3 cache? 

11. What is the size of L3 cache in the Pentium IV (2006 version)? 

12. Compare the speed of the L1—L3 caches in terms of the CPU speed. 

13. True or false. All Pentiums have the hyper-threading feature. 

14. True or false. All Pentiums have the multicore feature. 

15. Explain the differences between multithreading and multitasking. 

16. Explain the differences between multicore and multiprocessor. 

17. Explain the difference between MMX and XMM. 

18. True or false. The latest MMX supports both single- and double-precision floating- 
point data. 

19. True or false. The latest x86 Pentium uses out-of-order execution. 

20. True or false. The latest x86 Pentium uses branch prediction. 


SECTION 24.2: 64-BIT PROCESSORS AND VISTA FOR x86 


21. True or false. The 32-bit applications written for the 32-bit 386 architecture cannot 
be run on 64-bit x86 processors. 

22. Name all the 64-bit general-purpose registers of x86 64-bit processors. 

23. Name all the 32-bit general-purpose registers of x86 64-bit processors. 

24. Name all the 16-bit general-purpose registers of x86 64-bit processors. 

25. Name all the 8-bit general-purpose registers of x86 64-bit processors. 

26. What are the main modes of operation for 64-bit x86 processors? 

27. What is the maximum memory accessible by the compatibility mode of the 64-bit 
x86? 

28. What is the maximum memory accessible by the 64-bit mode of the x86? 

29. What is the maximum memory accessible by the XP for 32-bit architecture? 

30. What is the maximum memory accessible by the Vista x64? How much of that is 
implemented now? 

31. Can we access the 64-bit registers in all modes? 

32. The Vista x64 limits maximum memory to GB. 


ANSWERS TO REVIEW QUESTIONS 


SECTION 24.1: x86 PENTIUM EVOLUTION 


l. False 
Z. alse 
3. False 
4. In multicore, we have several CPUs inside a single chip accessing the same main 


memory space, while in multiprocessor, each CPU has its own memory space. 
J Ime 


SECTION 24.2: 64-BIT PROCESSORS AND VISTA FOR x86 


Compatibility mode and 64-bit mode 
False 


UA U bh = 
ba 
fas] 
= 
Dn 
(a) 


636 


CHAPTER 25 


SYSTEM DESIGN ISSUES AND 
FAILURE ANALYSIS 


OBJECTIVES 
Upon completion of this chapter, you will be able to: 


>> Contrast and compare MOS and bipolar transistors 

>> Evaluate logic families according to speed, power dissipation, noise 
immunity, input/output interface compatibility, and cost 

>> Trace the evolution of Intel x86 microprocessors in terms of IC 
technology 

>> Define IC fan-out and describe why connecting an output to too many 
inputs can cause false logic 

>> Describe capacitance derating and its effect on system design and the use 
of buffers to decrease its effect 

>> Discuss power in system design, including static and dynamic currents, 
power dissipation, and sleep mode for peripherals 

>> Define ground bounce and VCC bounce and describe methods that 
designers use to avoid the false signal generation that they may cause 

>> Define crosstalk and describe ways to avoid crosstalk in system design 

>> Define transmission line ringing and discuss ways to reduce its effect 

>> Define measures of system reliability: FIT and MTBF 


637 


The invention of the transistor and the subsequent advent of integrated circuit (IC) 
technology is believed by many to be the start of the second industrial revolution. In this 
chapter we provide an overview of IC technology and interfacing. In addition, we look at 
the computer system as a whole and examine some general considerations in system 
design. In Section 25.1 we provide an overview of IC technology. IC interfacing and sys- 
tem design considerations are examined in Section 25.2. In Section 25.2 we also discuss 
failure analysis in systems. 


SECTION 25.1: OVERVIEW OF IC TECHNOLOGY 


In this section we provide an overview of IC technology and discuss recent devel- 
opments in advanced logic families. 

The transistor was invented in 1947 by three scientists at Bell Laboratory. In the 
1950s, transistors replaced vacuum tubes in many electronics systems, including comput- 
ers. It was not until in 1959 that the first integrated circuit was successfully fabricated and 
tested by Jack Kilby of Texas Instruments. Prior to the invention of the IC, the use of tran- 
sistors, along with other discrete components such as capacitors and resistors, was com- 
mon in computer design. Early transistors were made of germanium, which was later 
abandoned in favor of silicon. This was due to the fact that the slightest rise in tempera- 
ture resulted in massive current flows in germanium-based transistors. In semiconductor 
terms, it is because the band gap of germanium is much smaller than that of silicon, result- 
ing in a massive flow of electrons from the valence band to the conduction band when the 
temperature rises even slightly. By the late 1960s and early 1970s, the use of the silicon- 
based IC was widespread in mainframes and minicomputers. Transistors and ICs were 
based on P-type materials. Due to the fact that the speed of electrons is much higher (about 
two and a half times) than the speed of the holes, N-type devices replaced P-type devices. 
By the mid-1970s, NPN and NMOS transistors had replaced the slower PNP and PMOS 
transistors in every sector of the electronics industry, including in the design of micro- 
processors and computers. Since the early 1980s, CMOS (complementary MOS) has 
become the dominant method of IC design. Next we provide an overview of differences 
between MOS and bipolar transistors. 


MOS vs. bipolar transistors 


There are two type of transistors: bipolar and MOS (metal-oxide semiconductor). 
Both have three leads. In bipolar transistors, the three leads are referred to as the emitter, 
base, and collector, while in MOS transistors they are named source, gate, and drain. In 
bipolar, the carrier flows from the emitter to the collector and the base is used as a flow 
controller. In MOS, the carrier flows from the source to the drain and the gate is used as 
a flow controller. In NPN-type bipolar transistors, the electron carrier leaving the emitter 
must overcome two voltage barriers before it reaches the collector (see Figure 25-1). One 
is the N-P junction of the emitter-base and the other is the P-N junction of the base-col- 
lector. The voltage barrier of the base-collector is the most difficult one for the electrons 
to overcome (since it is reversed biased) and it causes the most power dissipation. This led 
to the design of the unipolar type transistor called MOS. In N-channel MOS transistors, 
the electrons leave the source reaching the drain without going through any voltage barri- 
er. The absence of any voltage barrier in the path of the carrier is one reason why MOS 
dissipates much less power than bipolar transistors. The low power dissipation of MOS 
allows putting millions of transistors on a single IC chip. In today's million-transistor 
microprocessors and DRAM memory chips, the use of MOS technology is indispensable. 
Without the MOS transistor, the advent of desktop personal computers would not have 
been possible, at least not so soon. The use of bipolar transistors in both the mainframe 
and minicomputer of the 1960s and 1970s required expensive cooling systems and large 
rooms due to their bulkiness. MOS transistors do have one major drawback: They are 
slower than bipolar transistors. This is due partly to the gate capacitance of the MOS tran- 
sistor. For MOS to be turned on, the input capacitor of the gate takes time to charge up 
to the turn-on (threshold) voltage, leading to a longer propagation delay. 


ese 


638 


Bipolar NPN Transistor NMOS Transistor 


Figure 25-1. Bipolar vs. MOS Transistors 
Overview of logic families 


Logic families are judged according to (1) speed, (2) power dissipation, (3) noise 
immunity, (4) input/output interface compatibility, and (5) cost. Desirable qualities are 
high speed, low power dissipation, and high noise immunity (since it prevents the occur- 
rence of false logic signals during switching transition). In interfacing logic families, the 
more inputs that can be driven by a single output, the better. This means that high-driv- 
ing-capability outputs are desired. This plus the fact that the input and output voltage lev- 
els of MOS and bipolar transistors are not compatible means that one must be concerned 
with the ability of one logic family driving the other one. In terms of the cost of a given 
logic family, it is high during the early years of its introduction and prices decline as pro- 
duction and use rise. 


The case of inverters 


As an example of logic gates, we look at a simple inverter. In a one-transistor 
inverter, while the transistor plays the role of a switch, R is the pull-up resistor. See Figure 
25-2. However, for this inverter to work effectively in digital circuits, the R value must 
be high when the transistor is "on" to limit the current flow from VCC to ground in order 
to have low power dissipation (P = VI, where V = 5 V). In other words, the lower the I, 
the lower the power dissipation. On the other hand, when the transistor is "off", R must be 
a small value to limit the voltage drop across R, thereby making sure that VOUT is close 
to VCC. This is a contradictory demand on R. This is one reason that logic gate designers 
use active components (transistors) instead of passive components (resistors) to imple- 
ment the pull-up resistor R. 

The case of a TTL inverter with totem pole output is shown in Figure 25-3. In 
Figure 25-3, Q3 plays the role of a pull-up resistor. 


CMOS inverter 


In the case of CMOS-based logic gates, PMOS and NMOS are used to construct 
a CMOS (complementary MOS) inverter as shown in Figure 25-4. In CMOS inverters, 
when the PMOS transistor is off, it provides a very high impedance path, making leakage 
current almost zero (about 10 nA); when the PMOS is on, it provides a low resistance on 
the path of VDD to load. Since the speed of the hole is slower than that of the electron, 
the PMOS transistor is wider to compensate for this disparity; therefore, PMOS transis- 
tors take more space than NMOS. 


ee 
CHAPTER 25: SYSTEM DESIGN ISSUES AND FAILURE ANALYSIS 639 


Vcc Vcc 
Rc Rc 
Out Low 
In High On Low 


Rc must be very Rc must be very 
high value low value 


Figure 25-2. One-Transistor Inverter with Pull-up Resistor 


Figure 25-3. TTL Inverter with Totem-Pole Output 
Input, output characteristics of some logic families 


In 1968 the first logic family made of bipolar transistors was marketed. It was 
commonly referred to as the standard TTL (transistor-transistor logic) family. The first 
MOS-based logic family, the CD4000/74C series, was marketed in 1970. The addition of 
the Schottky diode to the base-collector of bipolar transistors in the early 1970s gave rise 
to the S family. The Schottky diode shortens the propagation delay of the TTL family by 
preventing the collector from going into what is called deep saturation. Table 25-1 lists 
major characteristics of some logic families. In Table 25-1, note that as the CMOS circuit's 


operating frequency rises, the power dissipation also increases. This is not the case for 
bipolar-based TTL. 


_ ese 
640 


Output Input 


OV 


Figure 25-4. CMOS Inverter 


Table 25-1: Characteristics of Some Logic Families __ 


Characteristic STD TTL LSTTL ALSTTL HCMOS 


Vcc SIV SV 5V 5V 

Vig 2.0 V 2.0 V 2.0 V ORISSV 
Vit 0.8 V 0.8 V 0.8 V teas 
Vog 2.4 V PATAN PTN SNN 
VoL 0.4 V 0.5 V 0.4 V 0.4 V 

Ii -1.6 mA -0.36 mA -0.2 mA -] pA 
Ly 40 pA 20 uA 20 pA l pA 

Jor 16mA 8 mA 4mA 4mA 
lou -400 pA -400 pA -400 pA 4mA 
Propagation dela ee es 9.5 ns 4 ns 9 ns 
Static power dissipation (f= 0) 10 mW 2 mW 1 mW 0.0025 nW 
Dynamic power dissipation 

at f = 100 kHz 10 mW 2 mW 1 mW 0.17 mW 


History of logic families 


Early logic families and microprocessors required both positive and negative 
power voltages. In the mid-1970s, 5V VCC became standard. For example, Intel's 4004, 
8008, and 8080 all used negative and positive voltages for the power supply. In the late 
1970s, advances in IC technology allowed combining the speed and drive of the S family 
with the lower power of LS to form a new logic family called FAST (Fairchild Advanced 


ee 
CHAPTER 25: SYSTEM DESIGN ISSUES AND FAILURE ANALYSIS 641 


Table 25-2: Lo 


ic Famil Overview 


Year © Static Supply High/Low Family 
Product Introduced Speed (ns) Current (mA) Drive (mA) 
Std TTL 1968 40 30 ieee 
CD4K/74C 1970 70 0.3 -0.48/6.4 
LS/S 197 18 54 -15/24 
HC/HCT IEY 25 0.08 -6/-6 
FAST 1978 6.5 90 -15/64 
AS 1980 6.2 90 -15/64 
ALS 1980 10 27 -15/64 
AC/ACT 1985 10 0.08 -24/24 
ECT 1986 6.5 LS -15/64 


Reprinted by permission of Electronic Design Magazine, c. 1991. 


Schottky TTL). In 1985, AC/ACT (Advanced CMOS Technology), a much higher speed 
version of HCMOS, was introduced. With the introduction of FCT (Fast CMOS 
Technology) in 1986, at last the speed gap between CMOS and TTL was closed. Since 
FCT is the CMOS version of FAST, it has the low power consumption of CMOS but the 
speed is comparable with TTL. Table 25-2 provides an overview of logic families up to 
BET. 


Recent advances in logic families 


As the speed of high-performance microprocessors such as the 386 and 486 
reached 25 MHz, it shortened the CPU's cycle time, leaving less time for the path delay. 
Designers normally allocate no more than 25% of a CPU's cycle time budget to path delay. 
Following this rule means that there must be a corresponding decline in the propagation 
delay of logic families used in the address and data path as the system frequency is 
increased. In recent years, many semiconductor manufacturers have responded to this 
need by providing logic families that have high speed, low noise, and high drive. Table 
25-3 provides the characteristics of high-performance logic families introduced in recent 
years. ACQ/ACTQ are the second-generation advanced CMOS (ACMOS) with much 
lower noise. While ACQ has the CMOS input level, ACQT is equipped with TTL-level 
input. The FCTx and FCTx-T are second-generation FCT with much higher speed. The x 
in the FCTx and FCTx-T refers to various speed grades, such as A, B, and C, where the 
A designation means low speed and C means high speed. For designers who are well 
versed in using the FAST logic family, the use of FASTr is an ideal choice since it is faster 
than FAST, has higher driving capability (Io,, Iou), and produces much lower noise than 
FAST. At the time of this writing, next to ECL and gallium arsenide logic gates, FASTr 
is the fastest logic family in the market (with the 5V VCC), but the power consumption is 
high relative to other logic families, as shown in Table 25-3. In recent years, a 3.3V VCC 
with higher speed and lower power consumption has become standard. The combining of 


Table 25-3: Advanced Logic General Characteristics 


Number Tech Static 

Family Year Suppliers Base VO Level Speed (ns) Current  Iop/loL 

AC 1989 2 M MOS/CMOS_ 6. 80u 2424 mA 
ACT TOS? ud CMOS TTL/CMOS HS 80 pA -24/24 mA 
ECIx 1987 3 CMOS TTL/CMOS 4.148 1.5mA -15/64 mA 
EcIxrT 190 2 CMOS INDAHL 41-48 15mA -15/64 mA 
EAS Tt N90. 1 Bipolar TRE 39 50 mA -15/64 mA 
BCT |: BICMOS TILIL 5) 10 mA -15/64 mA 


Reprinted by permission of Electronic Design Magazine, c. 1991. 


———— eee 


642 


high-speed bipolar TTL and the low power consumption of CMOS has given birth to what 
is called BICMOS. Although BICMOS seems to be the future trend in IC design, at this 
time it is expensive due to the extra steps required in BICMOS IC fabrication, but in some 
cases there is no other choice. For example, Intel's Pentium microprocessor, a BICMOS 
product, had to use high-speed bipolar transistors to speed up some of the internal func- 
tions in order to keep up with RISC processor performance. Table 25-3 provides advanced 
logic characteristics. Table 25-4 shows logic families used in systems with different 
speeds. The x is for the different speeds where A, B, and C are used for designation. A is 
the slowest one while C is the fastest one. The above data is for the 'LS244 buffer. 


Table 25-4: Importance of Speed 
Clock Period (ns 
FASTr, BCT, FCTA 


Reprinted by permission of Electronic Design Magazine , c. 1991. 


Evolution of IC technology in Intel's x86 microprocessors 


Since 1971, when Intel introduced the first microprocessor, the 4004, until the 
introduction of the Pentium microprocessor, IC technology has gone through some mas- 
sive changes. The early processors (4004 and 8008) used PMOS. The 8080, 8085, 8088, 
8086, and 80286 all used NMOS when first introduced. In recent years, CMOS versions 
of the 8088, 8086, and 286 have been introduced for power-efficient systems. Currently, 
CMOS is the universal technology in the design of microprocessors. Only BICMOS could 
allow designers to put over 3 million transistors on a single chip, make it work at 1 GHz, 
and consume around 10 watts of power. There has been a steady decline in the transistor's 
dimensions throughout the 1980s, 1990s, and 2000s. The design rule, the thickness of the 
lines inside the IC, has come down from a few microns to a fraction of a micron during 
this time. See Table 25-5. 


Table 25-5: Intel Microprocessor Evolution 


Microprocessor Year IC Tech (um) Supply (V) 


8086 5 
80286 
80386 
Pentium | 1993 [picmos| 08 | 
Pentium II 
Pentium III 1999 


eN 7000 BICMOS A 


Number of 
Transistors 


134,000 
275,000 


1.2 million 


(00) 
(e; 
& 
© 
(o>) 


3.1 million 


(oe) 


ie) 
ie) 


7.5 million 


9.5 million 


42 million 


The early microprocessors used power supplies with negative (—) and positive (+) 
voltages. For example, the 4004 used —10 and +5 V. The 8008 used —9 and +5 V, and 
the 8080 used —5, +5, +12 V. Since the introduction of the 8085, the use of a +5 V power 
supply has become standard in all microprocessors. To reduce power consumption, 3.3V 
VCC is being embraced by many designers. The lowering of VCC to 3.3 V has two major 


i 
CHAPTER 25: SYSTEM DESIGN ISSUES AND FAILURE ANALYSIS 643 


advantages: (1) it lowers the power consumption, resulting in prolonged battery life in 
systems such as a laptop PC or hand-held personal digital assistant, and (2) it allows a fur- 
ther reduction of line size (design rule) to submicron dimensions. This reduction results in 
putting more transistors in a given die size. The decline in the line size has reached 0.1 
micron, while the transistor density per chip is over 300 million. 


Review Questions 


State the main advantages of MOS and bipolar transistors. 

True or false. In logic families, the higher the noise margin, the better. 

True or false. Generally, high-speed logic consumes more power. 

Power dissipation increases linearly with the increase in frequency in 

(CMOS, TTL). 

In a CMOS inverter, indicate which transistor is on when the input is high. 

For system frequencies of 10-30 MHz, which logic families are used for the address 
and data path? 


SECTION 25.2: IC INTERFACING AND SYSTEM DESIGN 
ISSUES 


So geal am 


and 


There are several issues to be considered in designing a microprocessor-based 
system. They are IC fan-out, capacitance derating, ground bounce, VCC bounce, 
crosstalk, transmission lines, power dissipation, and chip failure analysis. This section 
provides an overview of these design issues in order to provide a sampling of what is 
involved in high-performance system design. 


IC fan-out 
In IC interfacing, fan-out/fan-in is a major issue. How many inputs can an output 


signal drive? This question must be addressed for both logic "0" and logic "1" outputs. 
Fan-out for low and fan-out for high are as follows: 


i OL 1 OH 
fan-out (of high) = 
Tn liy 


fan-out (of low) = 


Of the above two values the lower number is used to ensure the proper noise mar- 
gin. Figure 25-5 shows the sinking and sourcing of current when ICs are connected. 

In Figure 25-5, as the number of inputs connected to the output increases, Ip, 
rises, which causes Vo, to rise. If this continues, the rise of Vor makes the noise margin 
smaller, and this results in the occurrence of false logic due to the slightest noise. 

In designing the system, very often an output is connected to various kinds of 
inputs. See Examples 25-1 and 25-2. 

The total I, and [y requirement of all the loads on a given output must be less 
than the driver's maximum Io; and Io,;. This is shown in Example 25-3. 

In cases such as Example 25-3 where the receiver current requirements exceed the 
drivers' capability, we must use a buffer (booster), such as the 74xx245 and 74xx244. The 
74xx245 is used for bidirectional and the 74xx244 for unidirectional signals. See current 
74LS244 and 74LS245 characteristics in Table 25-6. 


644 


loL=> liL 


VoL = Ron (transistor) xl oL 


Figure 25-5. Current Sinking and Sourcing in TTL 

Example 25-1 
Find how many unit loads (UL) can be driven by the output of the LS logic family. 
Solution: 


The unit load is defined as I,, = 1.6 mA and I, = 40 A. Table 25-1 shows Ip, = 400 A and Io, = 8 
mA for the LS family. Therefore, we have 


fan-out (low) = Io,/J;, =8 mA / 1.6 mA=5 
fan-out (high) = Ip,/Iy, =400 pA/ 40 pA= 10 


This means that the fan-out is 5. In other words, the LS output must not be connected to more than 
5 inputs with unit load characteristics. 


Table 25-6: Electrical Specifications for Buffers 


74LS244 
74LS254 


Note: VOL = 0.4 V and VOH = 2.4 V are assumed. 


EE 
CHAPTER 25: SYSTEM DESIGN ISSUES AND FAILURE ANALYSIS 645 


Example 25-2 


An address pin needs to drive 5 standard TTL loads in addition to 10 CMOS inputs of DRAM chips. 
Calculate the minimum current to drive these inputs for both logic "0" and "1". 
Solution: 


The standard load for TTL is I, = 40 A and Iņ = 1.6 mA, and for CMOS, I, = lp = 10 A. 
minimum current for "0" = total of all I, =5 x 1.6 mA + 10 x 10A=8.1 mA 
minimum current for "1" = total of all I,,=5 x 40 A+ 10 x 10 A = 300 A 


Address Line 


10 CMOS 


Assume that the microprocessor address pin in Example 25-2 has specifications Io, = 400 A and lor 


= 2 mA. Do the input and output current needs match? 
Solution: 


For a high output state, there is no problem since Ioy > ly. However, the number of inputs exceeds 
the limit for Ior since an Ij, of 8.1 mA is much larger than the maximum Io, allowed by the micro- 
processor. 


Capacitance derating 


Next we study what is called capacitance derating and its impact in system 
design. A pin of an IC has an input capacitance of 5 to 7 pF.This means that a single out- 
put that drives many inputs sees a large capacitance load since the inputs are in parallel 
and therefore added together. Look at the following equations. 


Q = CV (25-1) 
Q _ CV 

y nn i (25-2) 
PET + (25-3) 
I = Ca (25-4) 


In Equation (25-4), I is the driving capability of the output pin, C is Ciy as seen 
by the output, and V is the voltage. The equation indicates that as the number of Cy loads 
goes up, there must be a corresponding increase in IO, the driving capability of the out- 
put. In other words, outputs with high values of Ior and Ip, are desirable. Although recent- 
ly there have been some logic families with lo, = 64 mA and Ip, = 15 mA, their power 


_———— ese 


646 


consumption is high. Equation (25-4) indicates that if I = constant, as C goes up, F must 
come down, resulting in lower speed. The most widely accepted solution is the use of a 
large number of drivers to reduce the load capacitance seen by a given output. Assume 
that we have a single address bus line driving 16 banks of 32-bit-wide memory. Each 
bank has 4 chips of 64K x 8 organization, which results in 16 x 4 = 64 memory chips, or 
16 x 64K x 32 = 32M bytes of SRAM. Depending on how many 244s are used to drive 
the memory addresses, the delay due to the address path varies substantially. To under- 
stand this we examine four cases. 


Case 1: Two 244 drivers 


This option uses two 244 drivers, one for A0-A7 and one for A8—A15. An out- 
put of the 244 drives 16 banks of memory, each with 4 inputs. Assuming that each mem- 
ory input has 5 pF capacitance, this results in a total of 4 x16 x 5 = 320 pF capacitance 
load seen by the 244 output. However, the 244 output can handle no more than 50 pF. As 
a result, the delay due to this extra capacitance must be added to the address path delay. 
For each 50 to 100 pF of capacitance, an extra 3 ns delay is added to the address path 
delay. In our calculation, we use 3 ns for each 100 pF of capacitance. Figure 25-6 shows 
driving memory inputs by two 244 chips. See Example 25-4. 


64K 
x1 


D30 D31 


bank 2 
D30 D31 


bank 3 
D30 D31 


bank 4 
D30 D31 


Figure 25-6. Case 1: Two 244 Address Drivers (the second 244 for A8—A15 is not shown) 


Example 25-4 


Calculate the following for Figure 25-6, assuming a memory access time of 25 ns and a propagation 
delay of 10 ns for the 244. 

(a) delay due to capacitance derating on the address path 

(b) the total address path delay for case l 

Solution: 


(a) Of the 320 pF capacitance seen by the 244, only 50 pF is taken care of; the rest, which is 270 
(320 — 50 = 270), causes a delay. Since there are 3 ns for each extra 100 pF, we have the following 
delay due to capacitance derating, (270/100) x 3 ns = 8.1 ns. 

(b) Address path delay = 244 buffer propagation delay + capacitance derating delay + memory 
access time = 10 ns + 8.1 ns + 25 ns = 43.1 ns. 


LV 
CHAPTER 25: SYSTEM DESIGN ISSUES AND FAILURE ANALYSIS 647 


Case 2: Doubling the number of 244 buffers 


Doubling the number of 244 buffers will reduce the address path delay. A single 
244 drives only 8 banks, or a total of 32 inputs, since there are 4 inputs in each bank. As 
a result, a 244 output will see a capacitance load of 32 x 5 = 160 pF. In this case, we use 
only four 244 buffer chips, as shown in Figure 25-7 and Example 25-5. 


bank 1 


HO DDZ D3 = i D30 D31 


bank 2 
DO D1 D2 D3 a a D30 D31 


* bank 3 
DO D1 D2 D3 D30 D31 


bank 4 


DO DI D2793 oo oo D30 D31 
Figure 25-7. Case 2: Four 244 Address Drivers (the two 244s for A8—A15 are not shown) 


Calculate (a) delay due to capacitance derating on the address path, and (b) total address path delay 
for case 2. Assume a memory access time of 25 ns and a propagation delay of 10 ns for the 244. 


Solution: 


(a) Of the 160 pF capacitance seen by the 244, only 50 pF is taken care of: the rest, which is 110 pF, 
causes a delay. Since there are 3 ns for each extra 100 pF, we have (110/ 100) x 3 ns = 3.1 ns delay 
due to capacitance derating. 

(b) The address path delay = 244 buffer propagation delay + capacitance derating delay + memory 
access time = 10 ns + 3.1 ns + 25 ns = 28.1 ns. 


Case 3: Doubling again 


In this case, we double the number of 244 buffers again, so that an output of the 
244 drives four banks, each with 4 inputs. This results in a total capacitance load of 4 x 4 
x 5 = 80 pF. Only 50 pF of it is taken care of by the 244, leaving 30 pF, causing a delay. 
See Figure 25-8. 

Examining cases 1 through 3 shows that for high-speed system design we must 
accept a higher cost due to extra parts and higher power consumption. 


Power dissipation considerations 


Power dissipation of a system is a major concern of system designers, especially 
for laptop and hand-held systems. Although power dissipation is a function of the total 
current consumption of all components of a system, the impact of VCC is much more pro- 
nounced, as shown next. Earlier we showed in Equation (26-4) that I= CFV. Substituting 
this in equation P = VI yields the following: 


F = VI =CFV2 (25-5) 


————— ese 


fe me | | bank 1 
DO D1 D2 D3 D30 D31 


DO D1 D2 D3 “930 D31 


Se dee bank 3 
DO D1 D2 D3 D30131 


eee wee b k 4 
DOsD1 .P2ab3 D30 D31 aa 


Figure 25-8. Case 3: A Single 244 Address Driver for Each Bank (A8—A15 are not shown) 


Example 25-6 


Prove that a 3.3 V system consumes 56% less power than a system with a 5 V power supply. 


Solution: 


Since P = VI, by substituting I = V/R, we have P = V2/R. Assuming that R = 1, we have P (3.3)? = 
10.89 W and P = (5)2 = 25 W. This results in using 14.11 W less (25 — 10.89 = 14.11), which means 
a 56% power saving (14.11 W/25 W x 100 = 56%). 


In Equation (25-5), the effects of frequency and VCC voltage should be noted. 
While the power dissipation goes up linearly with frequency, the impact of the power sup- 
ply voltage is much more pronounced (squared). See Example 25-6. 


Dynamic and static currents 


There are two major types of currents flowing through an IC: dynamic and static. 
A dynamic current is a function of the frequency under which the component is working, 
as seen in Equation (25-4). This means that as the frequency goes up, the dynamic current 
and power dissipation go up. The static current, also called dc, is the current consumption 
of the component when it is inactive (not selected). 


Power-down option 


The popularity of laptop PCs has led microprocessor designers to make an all-out 
effort to conserve battery power. Today processors have what is called system manage- 
ment mode (SMM), which reduces energy consumption by turning off peripherals or the 
entire system when not in use. According to Intel, SMM can put the entire system, includ- 
ing the monitor, into sleep mode during periods of inactivity, thereby reducing "power 
from 250 watts to less than 30 watts." The effects on the 3.3 V power supply alone trans- 
late into a power savings of up to 56% over systems with a 5 V power supply, as was 
shown in Example 25-6. 


LK 
CHAPTER 25: SYSTEM DESIGN ISSUES AND FAILURE ANALYSIS 649 


Ground bounce 


One of the major issues that designers of high-frequency systems must grapple 
with is ground bounce. Before we define ground bounce, we will discuss lead inductance 
of IC pins. There is a certain amount of capacitance, resistance, and inductance associat- 
ed with each pin of the IC. The size of these elements varies depending on many factors 
such as length, area, and so on. Figure 25-9 shows the lead inductance and capacitance 
of the 24 pins of a DIP IC. 


Self-inductance Capacitance 


1 15.10 nH 1.86 pF 
2 12.20 nH 1.70 pF 
3 9.54 nH 1.29 pF 
4 7.44 nH 0.95 pF 
5 5.37 MH 0.61 pF 
6 3.73 nH 0.43 pF 
7 3.41 nH 0.43 pF 
8 4.66 nH 0.61 pF 
9 6.95 nH 0.95 pF 
10 8.96 nH 1.29 pF 
11 14.70 nH 170p 
12 14.50 nH 1.86 pF 
13 14.50 nH 1.86 pF 
14 11.70 nH 1.70 pF 
tS 8.96 nH 1.29 pF 
16 6.95 nH 0.95 pF 
17 4.66 nH 0.61 pF 
18 3.41 nH 0.43 pF 
19 3.73 nH 0.43 pF 
20 5.31 nH 0.61 pF 
21 7.44 nH 0.95 pF 
22 9.54 nH 1.29 pF 
23 12.20 nH 1.70 pF 
24 15.10 nH 1.86 pF 


Figure 25-9. Inductance and Capacitance of 24-pin DIP 


Reprinted by Permission of Electronic Design Magazine, c. 1992. 


The inductance of the pins is commonly referred to as self-inductance since there 
is also what is called mutual inductance, as we will show below. Of the three components 
of capacitance, resistance, and inductance, self-inductance is the one that causes the most 
problems in high-frequency system design since it can result in ground bounce. Ground 
bounce is caused when a massive amount of current flows through the ground pin when 


———-__—_—_—_—_—_——— 


650 


multiple outputs change from high to low all at the same time. The voltage relation to the 
inductance of the ground lead follows: 


v= Lo (25-6) 


dy 


As we increase the system frequency, the rate of dynamic current, di/dt, is also 
increased, resulting in an increase in the inductance voltage L (di/dt) of the ground pin. 
Since the low state (ground) has a small noise margin, any extra voltage due to the induc- 
tance voltage can cause a false signal. To reduce the effect of ground bounce, the follow- 
ing steps must be taken where possible. 


1. The VCC and ground pins of the chip must be located in the middle rather than at the 
opposite ends of the IC chip (the 14-pin TTL logic IC uses pins 14 and 7 for ground 
and VCC). This is exactly what we see in high-performance logic gates such as Texas 
Instrument's advanced logic AC11000 and ACT11000 families. For example, the 
ACT11013 is a 14-pin DiP chip where pins 4 and 11 are used for the ground and 
VCC instead of 7 and 14 as in the TTL. We can also use the SOIC packages instead 
of DIP. The self-inductance of the leads is shown in Table 25-7. 

2. Use logics with a minimum number of outputs. Table 25-7: 20-Pin DIP and 
For example, a 4-output is preferable to an 8-out- SOIC Lead Inductance 


Peromance sytem avoid wsing memory chips org | 8 [5248 
nH 


performance systems avoid using memory chips 
or the drivers and buffers of 16- or 32-bit-wide SS 
4,10,11,20 | 137 | 4 


outputs since all the outputs switching at the same 13.7 2 
time will cause a massive flow of current in the 2,9,12,19 
ground pin, and hence cause ground bounce (see 3,8,13,18 86| 33) 

; 4,7,14,17 2.9 
3. Use as many pins for the ground and VCC as pos- erar teoteo 

sible to reduce the lead length, since the self- 5,6,15,16 34 | 24 


inductance of a wire with length | and a cross sec- Courtesy oiean eirunment 
tion of B x C is: 


L = 0.002 In (2 n +) (25-7) 
pec . 2 


Figure 25-10). 


As seen in Equation (25-7), the wire length, 1, contributes more to self-inductance 
than does the cross section. This explains why all high-performance microprocessors and 
logic families use several pins for the VCC and ground. For example, in the case of Intel's 
Pentium processor there are over 50 pins for the ground and another 50 pins for the VCC. 

The discussion of ground bounce is also applicable to VCC when a large number 
of outputs changes from the low to high state and is referred to as VCC bounce. However, 
the effect of VCC bounce is not as severe as ground bounce since the high ("1 ") state has 
wider noise margin than the low ("0") state. 


Filtering the transient currents using decoupling capacitors 


In the TTL family, the change of the output from low to high can cause what is 
called transient current. In totem-pole output, when the output is low, Q4 is on and satu- 
rated, whereas Q3 is off. By changing the output from the low to high state, Q3 becomes 
on and Q4 becomes off. This means that there is a time that both transistors are on and 
drawing currents from the VCC. The amount of current depends on the RON values of the 
two transistors, and that, in turn, depends on the internal parameters of the transistors. 
However, the net effect of this is a large amount of current in the form of a spike for the 
output current, as shown in Figure 25-10. To filter the transient current, a 0.01 F or 0.1 F 
ceramic disk capacitor can be placed between the VCC and ground for each TTL IC. 


Ooo ee ees 
CHAPTER 25: SYSTEM DESIGN ISSUES AND FAILURE ANALYSIS 651 


Ground 


Ground bounce occurs when data Transient current going from 0 to 1 
switches from all 1s to all Os 


Figure 25-10. (a) Ground Bounce (b) Transient Current 


However, the lead for this capacitor should be as small as possible since a long lead results 
in a large self-inductance and that results in a spike on the VCC line [V = L (di/dt)]. This 
is also called VCC bounce. The ceramic capacitor for each IC is referred to as a decou- 
pling capacitor. There is also a bulk decoupling capacitor, as described next. 


Bulk decoupling capacitor 


As many IC chips change state at the same time, the combined currents drawn 
from the board's VCC power supply can be massive and cause a fluctuation of VCC on 
the board where all the ICs are mounted. To eliminate this, a relatively large (relative to 
an IC decoupling capacitor) tantalum capacitor is placed between the VCC and ground 
lines. The size and location of this tantalum capacitor vary depending on the number of 
ICs on the board and the amount of current drawn by each IC, but it is common to have a 
single 22 F to 47 F capacitor for each of the 16 devices, placed between the VCC and 
ground lines. See Technical Notes TN0006 and TN4602 from Micron Technology. 


http://download.micron.com/pdf/technotes/TN0006.pdf 
http://download.micron.com/pdf/technotes/DDR/TN4602.pdf 


Crosstalk 


Crosstalk is due to mutual inductance. See Figure 25-11. Previously, we discussed 
self-inductance, which is inherent in a piece of conductor. Mutual inductance is caused by 
two electric lines running parallel to each other. It is calculated as follows: 


M = 0.002 1 n2% -in ( —22_4.L 2588 
d B OD ( ) 

where 1 is the length of two conductors running in parallel, and d is the distance 

between them, and the medium material placed in between affects K. Equation (25-8) 

indicates that the effect of crosstalk can be reduced by increasing the distance between the 

parallel or adjacent lines (in printed circuit boards, these will be traces). In many cases, 


652 


such as printer and disk drive cables, there is a ded- 
icated ground for each signal. Placing ground lines 
(traces) between signal lines reduces the effect of 


X l ` ———A\W—— vv 
crosstalk. This method is used even in some ACT p 


logic families where VCC and GND pins are next 


to each other. Crosstalk is also called EMI (electro- pw 


magnetic interference). This is in contrast to ESI 


E 
(electrostatic interference), which is caused by 0 
capacitive coupling between two adjacent conduc- 
tors. Figure 25-11. Crosstalk (EMD) 


Transmission line ringing 


The square wave used in digital circuits is in 
reality made of a single fundamental pulse and many 
harmonics of various amplitudes. When this signal trav- 
els on the line, not all the harmonics respond the same 
way to the capacitance, inductance, and resistance of the 
line. This causes what is called ringing, which depends 
on the thickness and the length of the line driver, among 
other factors. To reduce the effect of ringing, the line 
drivers are terminated by putting a resistor at the end of 
the line. See Figure 25-12. There are three major meth- 
ods of line driver termination: parallel, serial, and 
Thevenin. In many systems resistors of 30-50 ohms are 
used to terminate the line. The parallel and Thevenin 
methods are used in cases where there is a need to match 
the impedance of the line with the load impedance. This 
requires a detailed analysis of the signal traces and load 
impedance, which is beyond the scope of this volume. In 
high-frequency systems, wire traces on the printed cir- 
cuit board (PCB) behave like transmission lines, causing 
ringing. The severity of this ringing depends on the 
speed and the logic family used. Table 25-8 provides the 
length of the traces, beyond which the traces must be 
looked at as transmission lines. 


Ringing 


Buffer 


a 


Series termination 


al 


Parallel termination 


Figure 25-12. Reducing 
Transmission Line Ringing 


Table 25-8: Line Length Beyond Which 


FIT and failure analysis 


Chip manufacturers provide a 


Traces Behave Like Transmission Lines _ 


parameter called FIT (failure in time) Logic Family _Line Length (in) _ 
to measure the reliability for a single LS 25 
chip. The FIT of a single chip is the S, AS 1] 
number of expected failures in a bil- F, ACT g 
lion (109) hours of operation. Ifa chip Ag REL 6 
h ill be 300 z 
has FIT of 300, then there will be ECT, FCTA 5 


failures per billion device hours of 
operation. To reduce the number of 
device failures, manufacturers use 
burn-in to eliminate the early failures 


Absen O O A O O S 
(Reprinted by permission of Integrated Device Technology, 
copyright IDT 1991) 


before the product is shipped to the customer. This is commonly referred to as infant mor- 
tality since the failure rate starts high and eventually levels off to a constant level. See 
Figure 25-13. Although we can eliminate the early failures using burn-in, we can never 
reduce the failure rate to zero due to wearout and other factors such as soft error. This is 


discussed next. 


e 
CHAPTER 25: SYSTEM DESIGN ISSUES AND FAILURE ANALYSIS 653 


Failure 
Rate 


Infant 
Mortality 


Inherent Reliability 


Figure 25-13. Bathtub Failure Rate 


Soft error and hard error 


In memory there are two kinds of errors that can cause a bit to change: soft error 
and hard error. If the cell bit gets stuck permanently in a "high" or "low" state, this is 
referred to as a hard error. Hard error is due to deterioration of the cell caused by wear- 
out (see Figure 25-13). There is no remedy for hard error except to replace the defective 
RAM chip since the damage is permanent. The other kind of error, a soft error, alters the 
cell bit from 1 to 0 or from 0 to 1, even though the cell is perfectly fine (no hard error). 
Soft error is caused by alpha particle radiation and power surges. The sources of the alpha 
particles are the radiation in the air or the materials in the plastic package enclosing the 
RAM die. The occurrence of a soft error as a result of alpha particles ionizing the charges 
in a RAM cell is a greater source of concern since it is 5 times more likely to happen than 
a hard error. As the density of RAM chips increases and the size of the RAM cell goes 
down, the probability of a soft error for a given cell goes up, but the relation is not linear. 


Mean time between failures (MTBF) for system 


Reliability of system depends directly on two factors: a) the FIT (failures in time) 
value of a single part, and b) the number of parts in the system. We use these two factors 
to calculate what is called MTBF (mean time between failures). The MTBF predicts the 
average time before the first failure happens. The MTBF for a single chip is calculated 
using the FIT as follows: 


MTBF = ee hours (25-9) 


To get the MTBF rate for the system, we must divide the single-chip MTBF by 
the number of chips in the system. 


MTBF of t 2 MTBF of one chip ja 
SE number of chips in system ao 


See Examples 25-7 and 25-8. 


See Technical Notes TN-00-14 and TN-00-18 on the www.micron.com website. 


http://download.micron.com/pdf/technotes/TN0014.pdf 


NZ 


654 


http://download.micron.com/pdf/technotes/TN0018.pdf 


There is a paper called “Testing RAM for Embedded Systems” by Jack Ganssle 
and available from the folllowing website: 


http://www.ganssle.com/testingram.pdf 


Also see the article “Thirteen feet of concrete won't shield your RAM from the 
perils of cosmic rays. What's the solution?” by Jack Ganssle in Dr. Dobb’s Journal. It is 
available from the following website: 


http://www.ddj.com/dept/debug/ 196800160 


Example 25-7 


Assuming that the FIT for a single chip is 252, calculate the MTBF for: 
(a) a single chip 
(b) a system with 512 chips 


Solution: 
(a) The MTBF for a single chip is as follows: MTBF for 1 chip = 1,000,000,000 hr / 252 = 


3,968,254 hr = 453 years 
(b) The MTBF for 512 chips is = 453 years / 512 chips = 0.884 year = 323 days 


Example 25-8 
Calculate the system MTBF for the system in Example 25-7 if FIT = 745. 


Solution: 


MTBF for a single chip = 109 / 745 hrs. = 153 years. For the system it is 153 years / 512 = 109 days. 


ECL and gallium arsenide (GaAs) chips 


The use of secondary cache (L2: level 2 cache, as many call it) and EDC in sys- 
tems with speeds of 66 MHz and higher is adding to the data and address path delay. This 
is forcing designers to resort to using ECL and GaAs chips. Due to the fact that ECL chips 
have a very high power dissipation, they are not used in low-cost x86 design. However, 
GaAs chips are showing up in high-speed x86 and RISC-based computers. This is espe- 
cially the case for the GaAs EDC and cache controller chips. The mass of electrons in 
GaAs is lighter than in silicon, due to its quantum mechanics structure. As a result, the 
electrons in GaAs have a much higher speed. This means that GaAs chips can achieve a 
much higher speed than silicon. The power dissipation of the GaAs transistor is compara- 
ble to the silicon-based MOS transistor. Therefore, GaAs technology might appear to pro- 
vide the ideal chip since it has the speed of ECL (it is even faster than ECL) and the power 
dissipation of CMOS. However, it has the following disadvantages. 


1. Unlike silicon, of which there is a plentiful supply in nature in the form of sand, 


GaAs is a rare commodity, and therefore more expensive. 
2. GaAs is a compound made of two elements, Ga and As, and therefore is unstable at 


Ő 
CHAPTER 25: SYSTEM DESIGN ISSUES AND FAILURE ANALYSIS 655 


high temperatures. 
3. Itis very brittle, making it impossible to have large wafers. As a consequence, at this 
time no more than 100,000 transistors can be placed on a single chip. Contrast this to 
the millions of transistors for silicon-based chips. 
4. The GaAs yields are much lower than for silicon, making the cost per chip much 
more expensive than for silicon chips. 


These problems make the building of an entire computer based on GaAs a vision- 
ary project, if not an impossible one. This was the case for the CRAY III supercomputer, 
which was based on GaAs, and the buses ran at speeds of multiple GHz; but the project 
was also several years behind and millions of dollars over budget, so it was eventually 
abandoned and the company went out of business. 


Review Questions 


What is the fan-out of the "0" state? 

If the fan-out of "low" and "high" are 10 and 15, respectively, what is the fan-out? 
If Io, = 12 mA, Io = 3 mA for the driver, and I, = 1.6 mA, I, = 40 A for the load, 
find the fan-out. 

4. Why do Ij, and Ip; have negative signs in many TTL books? 

5. What are the 74xx244 and 74xx245 used for? 

6. What is capacitive derating? 

7. Ground bounce happens when the output makes a transition from to 


WN 


8. Give one way to reduce ground bounce. 

9. Transient current is due to transition of output from to l 

10. Why do high-speed logic gates using DIP packaging put the VCC and ground pins in 
the middle instead of the corners? 

11. True or false. Soft error is permanent. 

12. True or false. Hard error is permanent. 

13. Alpha particle radiation causes (soft, hard) errors. 

14. FIT is in (hours, months, years) of device operation. 

15. What is the MTBF for 512 megabytes of memory if DRAM chips used are 16M x 8 
with FIT = 252? 

16. What is the MTBF for 512 megabytes of memory if DRAM chips used are 16M x 8 
with FIT = 1000? 


PROBLEMS 


SECTION 25.1: OVERVIEW OF IC TECHNOLOGY 


1. Why do bipolar transistors dissipate more power? 

2. Why is the MOS transistor slower than the bipolar? 

3. Why has the use of NMOS replaced PMOS? 

4. Fora TTL inverter indicate which transistors are "on" and "off" for the following. 
(a) input = high (b) input = low 

5. Repeat Problem 4 for CMOS. 

6. Why in CMOS does the current dissipation rise as the frequency goes up? 

7. What is the purpose of the Schottky diode in the 74LSxx family? 

8. What is the noise margin for "0" and "1" in the LS family? 

9. What is the noise margin for "0" and "1" in the HCMOS family? 

10. Which one uses more static current, LS or HCMOS? 

11. Which one is faster, LS or ALS? 

12. Which one is more power efficient, LS or ALS? 

13. Which one is faster, AC/ACT or FCT? Which one is more power efficient? 

14. What is the FCT logic family? 

15. What is the advantage of FASTr over FAST logic? 


eee 


16. What is the BCT logic family? 

17. True or false. The LS family is used for system frequency of less than 10 MHz. 
18. True or false. The BCT family is used for system frequency of less than 15 MHz. 
19. Pentium uses a line size of micron. 

20. Why is CMOS the technology of choice in microprocessor design? 


SECTION 25.2: IC INTERFACING AND SYSTEM DESIGN ISSUES 


21. Calculate the fan-out if LS drives ALS. 

22. Calculate the fan-out for LS driving unit loads. 

23. Calculate the fan-out for ALS driving unit loads. 

24. Calculate the number of LS that the 74LS244 can drive, 

25. Find Io, and Iop needed to drive 10 LS and 20 CMOS input loads. 

26. True or false. Capacitance derating is a function of frequency. 

27. To minimize capacitance derating, use (high, low) drive capability logics. 

28. Calculate the path delay if one 244 is driving 2 banks each with 32 inputs and Cin 
for each input is 7 pF. Assume that the 244 delay is 8 ns and memory access time = 
1 Sms: 

29. Repeat Problem 28 where the number of 244s is doubled. 

30. Repeat Problem 29 where the number of 244s is doubled again. 

31. Which current is a function of frequency, dynamic or static? 

32. Give the advantages of lower VCC. 

33. If VCC = 3.7 V, compare the power dissipation in comparison with VCC 


34. If VCC = 1.8 V, compare the power dissipation in comparison with VCC 
SPN: 

35. Repeat Problem 34 using VCC = 3.3 V. 

36. Inmany IC chips VCC and GND are located at the middle of the DIP package instead 
of in the corners. Discuss the effect of VCC and GND pin locations in such cases. 

37. Discuss the causes and cures for ground bounce. 

38. Why is the effect of VCC bounce less severe than ground bounce? 

39. Discuss the cause of transient current and ways to reduce its effects. 

40. Discuss why many 245 drivers are used for the system data bus. 

41. Discuss the cause of crosstalk and methods to reduce it. 

42. What is the cause of ringing? 


43. PCB traces behave like transmission lines most in the logic family and least 
in the logic family. 

44. What is the purpose of 30-50 ohms resistance at the end of lines driving the DRAM 
arrays? 


45. Discuss the difference between soft error and hard error. 

46. Give the main causes of soft error and hard error. 

47. Calculate the MTBF for 16M of memory using 1M x 1 and FIT = 252. 
48. Calculate the MTBF for Problem 47 with EDC if the word size is 32-bit. 


ANSWERS TO REVIEW QUESTIONS 


SECTION 25.1; OVERVIEW OF IC TECHNOLOGY 


1. MOS is more power efficient, while bipolar is faster. 
2. lie 

3. True 

4, CMOS 

5. .NMOS 

6. In the lower end, ALS, and in the higher end, FAST 


a 
CHAPTER 25: SYSTEM DESIGN ISSUES AND FAILURE ANALYSIS 657 


SECTION 25.2: IC INTERFACING AND SYSTEM DESIGN ISSUES 


ie 
By 


4. 


658 


It is the number of loads that the driver can support and it is calculated by Io, /Ij,. 
10 

lor/lıı = 12 mA/1.6 mA = 7 and Iop/lų = 3 mA/40 pA = 75. Fan-out is 7, a lower 
number. 

The negative sign indicates that these currents are flowing out of the IC (convention- 
al current flow). 

They are used for the line driver: the 74xx244 for unidirectional and 74xx245 for 
bidirectional lines. 

It is signal delay caused by excessive load capacitance. 

High, low 

Make the ground pin length as small and short as possible. 

Low, high 

To make the self-inductance of pins VCC and GND small in order to reduce the 
ground and VCC bounce 

False 

True 


. Soft 


Hours 


. 453/32 = 14.1 years since we have 512M x 8/16M x 8 = 32 chips 
- 3.56 years (114.15 years for one DRAM divided by 32 chips) 


CHAPTER 26 


ISA, PC104, AND PCI BUSES 


OBJECTIVES 


Upon completion of this chapter, you will be able to: 


>> 
>> 
>> 
>> 


>> 
>> 
>> 
>> 
>> 
>> 
>> 


Describe the ISA bus signals for I/O interfacing 

Describe the ISA bus signals for memory interfacing 

Calculate I/O cycle time and bus bandwidth for the ISA bus 

Define the meaning of the terms master, slave, bus arbitration, bus 
protocol, and bus bandwidth and describe their importance in PC design 
Describe the evolution of bus architecture from ISA to PCI 

List the limitations of the ISA bus 

List the major characteristics of PCI architecture 

List the enhancements of the PCI bus over the ISA bus 

Contrast and compare ISA and PCI in terms of bus bandwidth 

Define the term local bus and describe its merits 

List the major characteristics of the PCI local bus 


659 


This chapter explores the ISA and PCI expansion slot buses. In Section 26.1 we 
present ISA bus memory signals. The I/O signals and specifications for the ISA bus are 
discussed in Section 26.2. The PCI bus is covered in Section 26.3. 


SECTION 26.1: ISA BUS MEMORY SIGNALS 


In Chapter 9 we covered the basics of ISA bus signals. In this section we provide 
more details of the ISA bus for memory interfacing including the memory read/write cycle 
time. In PCs with x86 microprocessors, the signals for the ISA expansion slots are provid- 
ed by the chipset. The chipset makes sure that the signals for the ISA slot conform with 
the ISA bus standard regardless of the CPU’s speed and data width. The ISA bus specifi- 
cations and timing for memory are precise and must be understood if we want to design 
an ISA plug-in card with on-board memory. Next, we review once more the signals relat- 
ed to memory in the ISA expansion slot. 


Address bus signals 


SA0-SA19 (system address) 


The system address bus provides the address signals for the desired memory (or 
I/O) location. The chipset latches these signals and holds them valid throughout the bus 
cycle. See Figure 26-1. 


LA17-LA23 (latchable address) 


These signals, along with SAO-SA19, allow access to 16M bytes of memory 
space from the ISA expansion slot. The chipset does not latch these signals. They must be 
latched by the board designed for the expansion slot. 


SBHE (system byte high enable) 


Because it is an active low signal, when it is low it indicates that data is being 
transferred on the upper byte (D8—D15) of the data bus. 


SD0-SD15 (system data bus) 


The system data bus (SDO-SD15) is used to transfer data between the CPU, 
memory, and I/O devices. 


Memory control signals 


MEMW (memory write) 


An active-low control signal is used to write data into the memory chip. This sig- 
nal is connected to the WE (write enable) pin of the memory chip. This signal can be used 
to access the entire 16M allowed by the ISA bus. 


MEMR (memory read) 


This active-low control signal is used to read data from the memory chip. It is 
connected to the OE (output enable) pin of the memory chip. This signal can be used to 
access the entire 16M allowed by the ISA bus. MEMW and MEMR are used for 16M 
memory. However, if the 1M memory 00000-FFFFFH is chosen, the following control 
signals must be used. 


SMEMW (system memory write) 


An active-low control signal used to write data into a memory chip. This signal is 
connected to the WE pin of the memory chip. This signal goes low when accessing 
addresses between 0 and FFFFFH (0 and 1M bytes). 


SMEMW (system memory write) 


An active-low control signal used to read data from the memory chip. This signal 
is connected to the OE pin of the memory chip. This signal goes low when accessing 
addresses between 0 and FFFFFH (0 and 1M bytes). 

Although the ISA bus has a 16-bit data bus (D0-D15), either the 8-bit section 
(D0—D7) or the entire 16 bits (D0-D15) can be used. This is decided by the input pin 
MEMCS 16, as explained next. 


eee 
660 


REAR PANEL 
SIGNAL NAME SIGNAL NAME 


-I/O CH CK 
SD7 
SD6 
SD5 
SD4 
SD3 
SD2 
SD1 
SDO 
-I/O CH RDY 
AEN 
SA19 
SA18 
SA17 
SA16 
SA15 
SA14 
SA13 
SA12 
SA11 
SA10 
SA9 
SA8 
SA7 
SAG 
SA5 
SA4 
SA3 
SA2 
SA1 
SAQ 


i 


SBHE 
LA23 
LA22 
LA21 
LA20 
LA19 
LA18 
LA17 
-MEMR 
-MEMW 
SD08 
SD09 
SD10 
SD11 
SD12 
SD13 
SD14 


SD15 


| 


COMPONENT 
SIDE 


Figure 26-1. ISA (IBM PC AT) Bus Slot Signals 


(Reprinted by permission from “IBM Technical Reference” c. 1985 by International Business Machines Corporation) 


a 
CHAPTER 26: ISA, PC104, AND PCI BUSES 661 


MEMCS16 (memory chip select 16) 


This is an input signal and is active low. When not asserted, it indicates to the 
chipset that only the DO—D7 portion of the data bus is being used. Notice that the 8-bit 
portion is the default mode and it is achieved by doing nothing to this pin. In contrast, 
when this signal is asserted low, both the low byte and high byte of the data bus (D0-D15) 
will be used for data transfer. Therefore, to use the entire 16-bit data bus, this pin must be 
low. 

The ISA bus allows the interfacing of slow memories by inserting wait states into 
the memory cycle time. This prolonging of memory cycle time is available for both 8-bit 
and 16-bit data transfers. The standard 8-bit data transfer has 4 WS in the memory read 
cycle time. As a result, the default memory cycle time is 6 clocks. The standard 16-bit 
data transfer uses 1 WS in the memory cycle time. That results in 3 clocks for the 
read/write cycle time. To shorten the memory cycle time we use the ZEROWS pin as 
explained next. 


ZEROWS (zero wait state) 


This is an input signal and is active low. The standard 16-bit ISA bus cycle time 
contains one WS unless ZEROWS is activated. By activating this pin (making it low), we 
are telling the CPU that the present memory cycle can be completed without a wait state. 
That results in performing the bus cycle time in 2 clocks. The standard 8-bit ISA bus cycle 
time contains 4 WS unless ZEROWS is activated. That means that when both MEMCS 16 
and ZEROWS are high (without asserting them low), the 8-bit data bus (D0-D7) is being 
used and the data transfer is completed in 6 clocks. This default 8-bit read/write cycle time 
with its 4 WS is sufficient for interfacing even slow ROMs to the ISA bus. If MEMCS16 
= 1 and ZEROWS = 0, the 8-bit memory cycle time has 1 WS instead of 4. 


SYSCLK (system clock) 


This is is an output clock providing the standard 8 MHz ISA bus clock. The 8 
MHz clock results in a 125 ns (1/8 MHz = 125 ns) period and all memory and I/O ISA 
bus timing is based on this. Therefore, a zero WS read cycle time for a 16-bit bus takes 2 
x 125 ns = 250 ns. The standard 8-bit bus with its 4 WS will be (2 + 4) x 125 ns = 750 ns. 
The 8-bit bus with ZEROWS asserted low has 1 WS, making its memory cycle time (2 + 
1) x 125 ns = 375 ns. The SYSCLK pin is located on the B side of the 62-pin section of 
the ISA bus. 


IOCHRDY (IO Channel Ready) 


This is an input signal into the ISA bus and is active low. By driving it low, we 
are asking the system to extend the standard ISA bus cycle time. In response to the assert- 
ing of this signal, the system will insert wait states into the memory (or I/O) cycle time 
until it is deasserted. This is rarely used for memory interfacing since the standard mem- 
ory cycle time of the ISA bus provides plenty of time. 


ISA bus timing for memory 


Suppose that we are designing a data acquisition board for an ISA expansion slot, 
and the board requires ROM. What would be the best approach? We can use either the 8- 
bit or 16-bit data section of the ISA bus. There are some major differences between them 
that must be noted. Next we look at each separately. 


8-bit memory timing for ISA bus 


In the case of using the 8-bit data bus, we use D0-D7 and A0—A19 of the 62-pin 
section of the bus. More importantly, the ISA bus provides plenty of time for slow mem- 
ories by inserting wait states into the read cycle time. First, we must remember that the 
memory cycle time is only 2 clocks (with 0 WS) and the maximum speed is 8 MHz. Since 
1/8 MHz = 125 ns, the bus cycle time is 2 x 125 = 250 ns. In the case of the standard 8- 
bit read/write cycle time, the chipset inserts 4 WS clocks into the read cycle time as shown 
in Figure 26-2. In the case where ZEROWS is asserted low and MEMCS16 = 1 aie 
read/write cycle time has 1 WS for the 8-bit bus as shown in Figure 26-3. 


662 


Tc1 E2 ; Tes Tc4 Tc Ts 


SYSCLK 

BALE 
LA[23:17] 
SA[19:0] 

sene LD 


MEMR 


emes16 IM MMM 


zeRows ZZZ CLL 


SD[7:0] 
(READ) 


ISACLK2 


SYSCLK 


BALE 


LA[23:17] 
SA[19:0] 


SBHE 
MEMR 


vencsie ZN ZZZ 


i ry . ry 


ZEROWS 


SD[7:0] 
(READ) 


Figure 26-3. Zero WS 8-bit ISA Memory Read Cycle Time (1 WS) 


Figure 26-4 shows the 8-bit memory interfacing for the ISA bus. 


EEE aaa, 
CHAPTER 26: ISA, PC104, AND PCI BUSES 663 


ROMCSO# 


MEMR# 


SA[17:0] Mind 


Figure 26-4. 8-bit ROM Connection to ISA Bus 
16-bit memory timing for ISA bus 


As mentioned earlier, the 8-bit data transfer is the default mode for the ISA bus 
expansion slots. In order to perform the 16-bit data transfer using lines DO-D15, we must 
assert the MEMCS16 pin low. A 16-bit data bus transfer is twice as fast as an‘8-bit data 
bus transfer. However, it also requires twice the board space in addition to having a high- 
er power consumption. The 16-bit data read cycle time for the ISA bus with zero WS is 
shown in Figure 26-6. Notice that in all the 2-clock bus cycle CPUs and systems, the first 
clock is set aside for the addresses and the second clock is for data. In order to get a zero 
wait state bus activity, the ZEROWS pin (active low) must be activated. This is because 
the standard ISA bus cycle contains one wait state. The standard 16-bit ISA bus cycle tim- 
ing with 1 WS is shown in Figure 26-5, and Figure 26-6 shows the ISA cycle with 0 WS. 
Combining the effect of MEMCS16 and ZEROWS pins gives the information in Table 26- 
1. See Example 26-1. 


Table 26-1: ISA Bus Memory Read/Write Cycle Time 
MEMCS16 | ZEROWS Read Cycle Time 


0 w 
oo a 


0 
J| 
l 


Example 26-1 


Calculate the bus bandwidth of the ISA bus for (a) 0 WS, and (b) 1 WS. Assume that all the 
data transfers are 16-bit (D0-D15). 
Solution: 


Since the ISA bus speed is 8 MHz, we have 1/8 MHz = 125 ns as a clock period. The bus cycle 
time for zero wait state is 2 clocks; therefore, we have: 


(a) ISA bus cycle time with 0 WS is 2 x 125 ns = 250 ns. 

Bus bandwidth = 1/250 ns x 2 bytes = 8 megabytes/second. 
(b) ISA bus cycle time with 1 WS is 250 ns + 125 ns = 375 ns. 

Bus bandwidth = 1/375 ns x 2 bytes = 5.33 megabytes/second. 


Another major issue in 16-bit ISA bus interfacing is the problem of odd and even 
banks. For example, assume that we are interfacing two ROM chips to DO-D15 of the ISA 
bus, one connected to D0-D7 and the other to D8-D15. We must divide our information 
(code or data) into two parts and burn each part into one of the ROMs. In many ROM 
burners, there is an option for splitting the data into odd- and even-addressed bytes to sup- 


ees 
664 


Ts Tc1 Tc2 Ts Te Tco2 Ts 


excel] PLL LL Lr Lo 


SYSCLK 


BALE 


LA[23:17] | a O an 


ZEROWS YL MMMM LLL : 


SDI7:0] 
(READ) 


Figure 26-5. Standard 16-bit ISA Memory Read Cycle Time (1 WS) 


ISACLK2 
SYSCLK 
BALE 


LA[23:17] 


SA[19:0] 
SBHE 


MEMR 


memcs Yr _. n 
MMMM ES 


a, TID > KR 


Figure 26-6. Zero WS 16-bit ISA Memory Read Cycle Time (0 WS) 


port 16-bit data systems. The 16-bit data connection to two ROMs for the ISA slot is 
shown in Figure 26-7. 


DIMM and SIMM memory modules 


In the 1980s, PCs had sockets on the motherboard for DRAM chips. To expand 
the memory of a PC, you had to buy memory chips and plug them into the sockets. With 
the introduction of the 16-bit ISA bus, memory expansion boards became common. The 
problem with the memory expansion cards was that you were limited to the bus speed of 
the ISA expansion slot no matter how fast the memory chip. This led to the idea of a mem- 
ory module as a way to expand memory for x86 PCs (386 and higher). The connectors for 
memory modules are much smaller in size and accommodate much faster memory than 


ee ee nen —— 


CHAPTER 26: ISA, PC104, AND PCI BUSES 665 


ROMCSO0# 


MEMR# 


MEMW# 


Tte- 
SA[16:0] me XD[15:8] 


XD[7:0] 


Figure 26-7. 16-bit ISA Bus Connection to ROM 


do ISA connectors. It is also much easier to insert them into the motherboard. The only 
problem was the lack of a standard connector. This was solved with the introduction of 
SIP (single in-line package). Later, the SIMM (single in-line memory module) and DIMM 
(dual in-line memory module) were introduced. Currently, SIP is no longer in use, and 
SIMM and DIMM are the dominant memory modules. It is important to notice that the 
use of memory modules frees the motherboard designer from the agonizing choice of 
which organization and speed of DRAM to use. All that is required is to incorporate var- 
ious organizations and speeds into the design of the motherboard and let the user select 
options via the CMOS set-up process. 


ROM duplicate and x86 PC memory map 


The memory map for the 1-megabyte memory range 00000 to FFFFFH is the 
same for all x86-based PCs. See Figure 26-8. As was shown in Chapter 9, when 286 
microprocessors are powered up, the CPU is in real mode and fetches the first opcode 
from physical memory location FFFFFOH. This is because CS = F000H, IP = FFF OH, and 
A20-A23 are all high. This leads to a physical address of FFFFFOH, which is 16 bytes 
below the top of FFFFFFH, the 16-megabyte maximum memory range of the 286. After 
execution of the first opcode, address pins A20-A23 all become Os. Address pins 
A20-A23 will not be activated again unless the 286 mode of operation is changed to pro- 
tected mode. In other words, when the CPU wakes up in real mode at address FFFFOH, 
the first opcode is fetched from FFFFFOH because A20—A23 are all high. This is one rea- 
son that there is an exact duplicate of ROM at addresses OF0000-0FFFFF and 
FF0000-FFFFFF. This duplication allows access of BIOS ROM in both real and protect- 
ed modes. This concept applies to the 386/486 and Pentium PCs and is shown in Figure 
26-9a. In these processors, the 32-bit address bus provides the memory space of 00000000 
to FFFFFFFFH. We can verify this by using the system tools software that comes with 
Windows. See Figure 26-9b. You can experiment with this by going to Accessories, click- 
ing on System Tools, and then clicking on System Information. Click on Hardware 
Resources and then click on Memory. 


666 


00000000 
OOOOFFFF 
000A0000 
OOOBFFFF 
000C0000 
OOODFFFF 
000E0000 
OOOEFFFF 
OOOFO000 
OOOFFFFF 


Duplicate 


al 


File Edit View Tools Help 


| System Summary “ Resource Device | 
_ 5 Hardware Resources | 0x0000-049F BFF System board it 
| Conflicts5 haring | OxBFES?CO0-02BFEFFFFF System board 
DMA, | OxBFFOOUO0-0%B FFFFFFF System board | 
| Forced Hardware | OxFFBOOCOO-OxFFFFFFFF System board | 
| vo | OxFECOO000-04FE COFFFF System board ! 
| IRQs ! [| OxFEE OO000-0xF EE OFFFF System board , 
| | OxFED20000-0xFED3FFFF System board 
| | OsFED45000-0xFEDSFFFF System board + | 
_ & Components äl decal maT, : - | 


fone hbase Cane siecanmncs ink 


Find what: l 


[C] Search selected category only 


[Gee ind] 


[|Search category names only 


Figure 26-9b. x86 PC Memory Map from System Information Tool in Windows 
Shadow RAM 


By using System Tools to explore the system memory of Pentium PCs, we can see 
the duplicates of ROM in the RAM memory space. The reason for that is the fact that the 
ROM access time is too slow for the 100 MHz bus speed. To speed up the ROM access 
time, its contents are copied into RAM and write protected. This is called shadow RAM 
and provides the ROM’s contents to CPU at a much faster speed than ROM. From this 
point forward, every time the CPU needs to access the ROM’s contents, it will get the 
information from RAM at a very high speed. As long as the PC is on, the DRAM contain- 
ing the ROM information is write protected and will not be corrupted. It must be noted 
that the process of creating shadow RAM is done when the system is booted, and when 
the PC is turned off, shadow RAM’s contents are lost. 


Review Questions 


The MEMCS16 pin of the ISA bus is an active- (low, high) signal. 
If MEMCS16 = high, which portion of the data bus is used? 

The ZEROWS pin of the ISA bus is an active- (low, high) signal. 
If ZEROWS = high, the 8-bit memory cycle time takes 
The ISA bus has a maximum frequency of 8 MHz. Find its bus bandwidth for 0 WS. 


a eh 


a 


CHAPTER 26: ISA, PC104, AND PCI BUSES 667 


SECTION 26.2: I/O BUS TIMING IN ISA BUS 


As we have seen, interfacing 8-bit devices such as the 8255 to the x86 PC is a 
straightforward process because the 8-bit data pins match D0-D7 of the ISA bus. Just as 
in 16-bit memory interfacing, there is a problem when we want to use the 16-bit (D0—D15) 
data bus of the x86 for I/O operations. In this section we look at the timing and design of 
16-bit data I/O and compare it with 8-bit I/O operations. First, we examine a few issues 
concerning the ISA bus. 


8-bit and 16-bit I/O in ISA bus 


The term ZSA computers encompasses IBM PC AT, PS, and any x86 PCs with AT- 
type expansion slots, as explored in Chapter 9. These computers could use the 286, 386, 
486, Intel Pentium, or any x86 microprocessor from AMD as their CPU, but they have 
ISA-type expansion slots. The following points must be noted about these types of com- 
puters: 


1. In communications between the x86 CPU and I/O ports, typically I/O devices are slow 
and cannot respond to the CPU's normal speed. In such situations, wait states must be 
inserted into the I/O cycle. The 80286 and all higher microprocessors (386, 486, 
Pentium, etc.) have two clocks for the I/O cycle time when they are designed with 0 
WS. In this regard, it is the same as memory cycle time for such processors. For exam- 
ple, a 100-MHz Pentium processor provides a total of 20 ns (2 x 10 ns since 1/200 
MHz = 10 nanoseconds) for the I/O cycle time. While in recent years, memory speed 
has been increasing steadily, there has not been a corresponding increase in the speed 
of I/O components such as ADCs (analog-to-digital converters). In general, I/O 
devices are much slower than memory since the I/O is interfaced with nature where- 
as memory is a semiconductor device. For example, a temperature sensor converts 
temperatures to voltage levels and the voltages are converted to binary numbers using 
ADCs before they are provided to the CPU. The delays associated with each stage of 
conversion add to the I/O response time, causing a severe bottleneck. This is only one 
of the reasons why the ISA expansion slot speed is limited to 8 MHz, in spite of the 
fact that the CPU bus speed is 100 MHz in the Pentium. Therefore, to interface with 
slow I/O devices, one must insert wait states into the I/O cycle time to match the 
device speed. 

2. When the CPU communicates with an ISA expansion slot, be it memory or a periph- 
eral I/O port, it can use only an 8- or 16-bit data bus, even if the CPU is 32-bit, such 
as a 386, 486, or Pentium. In 386 and 486 PCs where the CPU has a 32-bit data bus, 
memory (or even a peripheral) on the motherboard uses a 32-bit data bus, but when it 
goes to an ISA expansion slot it must use a 16-bit bus. When designing a plug-in card 
for the ISA expansion slot of a motherboard with a 32-or 64-bit CPU, we use the 8- 
or 16-bit data bus but not the 32/64 bit data buses. To access the x86’s entire 32- or 
64-bit data bus through the expansion slot, we must use a PCI slot. This is discussed 
in Section 26.3. 

3. The ISA bus speed is limited to 8 MHz. It does not matter that the x86 CPU works on 
a frequency of 10 MHz or 100 MHz or even 1 GHz: When it communicates with 
devices (memory or I/O ports) through the ISA expansion slots it must slow down to 
8 MHz. That means inserting many wait states in order to access the boards connect- 
ed to ISA expansion slots. The good news is that the chipset on the motherboard will 
do all the above tasks. 


These three limitations are commonly referred to as I/O bottleneck since they 
slow down the flow of information to/from the CPU when using I/O devices. 
I/O signals of the ISA bus 


Just as in memory, the ISA bus supports both 8- and 16-bit 1/0 operations. 
Furthermore, the problems associated with 8- and 16-bit data memory interfacing dis- 


= eee 
668 


cussed in Chapter 10 also exist in I/O interfacing of the ISA bus. The following ISA slot 
pins are associated with I/O interfacing and must be understood. 


SA0-SAQ (system address) 


The system address AO—A9 bus provides the I/O addresses. This limits the I/O 
addresses supported by the ISA slot to 1024 ports. 


SD0-SD7 (system data bus) 


The system D0-D7 8-bit data bus or DO—D15 16-bit data bus provides the data 
path between the CPU and the I/O device. 


IOR and IOW 


The IOR and IOW control signals are both active low and are connected to the 
read and write pins of I/O devices. 


IOCS16 (I/O chip select 16) 


This is an input into the ISA bus and is an active-low signal. It informs the sys- 
tem that the I/O operation uses the entire 16-bit data bus DO-D15. If this input is not driv- 
en low, the ISA bus uses the 8-bit data bus D0-D7. The IOCS16 input pin is used to tell 
the motherboard circuitry that the present I/O cycle is a 16-bit data transfer. When the data 
transfer is a 16-bit transfer to an 8-bit peripheral, if the IOCS16 pin is not pulled low the 
data transfer is performed in two consecutive I/O cycles, each transferring one byte at a 
time, which requires more time. Add-in cards with a 16-bit data path use the IOCS16 pin 
to instruct the motherboard not to convert a word transfer into a byte transfer. This pin 
must be driven with an open collector or tri-state driver capable of sinking 20 mA. Bus 
data transfers are summarized as shown in Table 26-2. 


Table 26-2: ISA Bus Data Transfer Summary 


from Source to Destination Number of Clocks per Cycle 


ee ——— MMMM 
Ser e e e 3 

wo J ia ee 
Reminder: In 286/386/486/Pentium machines, the memory (or I/O) cycle consists of 2 clocks when it is a zero -wait- 
state cycle. 


ZEROWS (zero wait state) 


ZEROWS is an input pin into the ISA bus and is active low. If this signal is driv- 
en low, it tells the system that I/O and memory operations can be completed without any 
WS. We will see how this applies to I/O operation in this section. 


IOCHRDY (I/O channel ready) 


This is an input signal into the ISA bus and is active low. By driving it low, we 
are asking the system to extend the standard ISA bus cycle. In response to asserting this 
pin, the system will insert wait states into the I/O or memory cycle until it is deasserted. 
The function of this pin is the opposite of ZEROWS. In other words, by asserting this pin 
we are extending the I/O (or memory) read and write cycle time to allow the interfacing 
of slow devices to the ISA bus. 


i 
CHAPTER 26: ISA, PC104, AND PCI BUSES 669 


REAR PANEL 
SIGNAL NAME SIGNAL NAME 


-/O CH CK 
SD7 
SD6 
SD5 
SD4 
SD3 
SD2 
SD1 
SDO 
-I/O CH RDY 
AEN 
SA19 
SA18 
SA17 
SA16 
SA15 
SA14 
SA13 
SA12 
SA11 
SA10 ` 
SA9 
SA8 
SA7 
SAG 
SA5 
SA4 
SA3 
SA2 
SA1 
SAO 


SBHE 
LA23 
LA22 
LA21 
LA20 
LA19 
LA18 
LA17 
-MEMR 
-MEMW 
SD08 
SDO09 
SD10 
SD11 
$D12 
SD13 
SD14 
SD15 


COMPONENT 
SIDE 


Figure 26-10. ISA (IBM PC AT) Bus Slot Signals 


(Reprinted by permission from “IBM Technical Reference” c. 1985 by International Business Machines Corporation) 


670 


Tel Te2 .. Te Tc4 lem Ts 


Peer we) Lf Ler 
SYSCLK 


BALE 


a 


IOR 


IOCS16 


ZEROWS ZZZ | LLL 


SD{7:0] : : 


(READ) (EEE Cu 


Figure 26-11. Standard 8-bit ISA I/O Read Cycle Time (4 WS) 


ISACLK2 
SYSCLK 
BALE 
SA[19:0] 
SBHE 


IOR 


IOCS16 


ZEROWS TILL : 


SD[7:0] : : ; 
(READ) : 


Figure 26-12. Zero WS 8-bit ISA T/O Read Cycle Time (1 WS) 
8-bit I/O timing and operation in ISA bus 


It would be helpful to review the discussion of 8-bit memory operation and inter- 
facing in Chapter 10 since memory and I/O operations are very similar. The I/O opera- 
tion of the ISA bus defaults to 8-bit and uses the DO—D7 data bus to transfer data between 
the I/O device and the CPU. It completes the read (or write) cycle in 6 (2 + 4 WS) clocks 
if ZEROWS is not asserted low. See Figure 26-11. In other words, the default I/O opera- 
tion for ISA bus has 4 WS and uses the 8-bit data bus. This is an 8-bit standard I/O oper- 
ation and is shown in Figure 26-11. With a maximum of 8 MHz for the ISA bus clock, 
1/O takes a total of 125 ns x 6 = 750 ns. Just like memory cycle time, we can shorten the 
I/O cycle time by asserting the ZEROWS pin low. This will cause the I/O operation to be 
completed in 3 (2 + 1 WS) clocks instead of 6 (2+ 4 WS). Notice that the default is 4 WS 
unless ZEROWS is asserted. This is shown in Figure 26-12. Now if the default 4 WS I/O 

a 


CHAPTER 26: ISA, PC104, AND PCI BUSES 671 


cycle time is not long enough, we can extend it by driving the IOCHRDY pin low. The 
extension happens as long as the IOCHRDY pin is low. The maximum extension 1s limit- 
ed to 10 WS. 


16-bit I/O operation and timing in ISA bus 


Review the discussion of 16-bit memory operations and interfacing in Chapter 10 
since memory and I/O operations are very similar. Just as in memory, the 16-bit I/O port 
uses data bus DO-D15 to transfer data between the CPU and I/O devices. First let’s look 
at 16-bit I/O instructions supported by the x86 family. Similar to memory, the x86 allows 
the use of DO-D15 for data transfer between the CPU and I/O devices. It must be noted 
that in the 16-bit I/O operation, the low byte uses D0-D7 and the high byte uses the 
D8—D15 data bus. The following is the 16-bit I/O instruction format. 


16-bit data ports instruction 


Inputting Data Outputting Data 
(1) IN AX, port# OUT port#,AX 
(2) MOV DX,port# MOV DX,port# 
IN AX, DX OUT DxX,AX 


Notice that we must use AX instead of AL in the 16-bit I/O. To use 16-bit data 
I/O, we need two port addresses, one for each byte. Again, this is because I/O space is byte 
addressable, just like memory space. Look at the following: 


MOV AX, 98F6H 
OUT 40H,AX ;send out AX to port 308H & 309H 


In this case F6H, the content of AL, goes to port address 40H while 98H, the con- 
tent of AH, is transferred to port address 41H. The low byte goes to the low port address 
and the high byte to the high port address. This is exactly like memory data transfers in 
that the low byte goes to the low address location and the high byte goes to the high 
address location (the little endian convention). This principle works the same for 16-bit 
port addresses as shown below: 


MOV DX TSOR 
MOV AX, 98F6H 
OUT DCIS AisSviel Obit AV< Gere) joroucie, Sule i S 


Ts 


geie «ae fle oak TT OSR 


SYSCLK 


Tc 


BALE 


SA[19:0] 


re © 


IOR 


IOCS16 


ZEROWS 


Figure 26-13. Standard 16-bit ISA T/O Read Cycle Time (1 WS) 


ÁÁ- eee 
672 


Again the F6H is sent to port address 310H using data path DO—D7 and 98H goes 
to port address 311H using the D8—D15 data path. Next we will contrast 8-bit and 16-bit 
T/O via the ISA bus. 


16-bit I/O timing and operation via ISA bus 


As mentioned earlier, 8-bit data transfer is the default mode for the ISA bus 
expansion slots. To perform 16-bit data transfers using D0-D15, we must assert the 
IOCS16 pin low. The 16-bit bus transfers data twice as fast as an 8-bit data bus. However, 
it requires twice the board space in addition to an increase in power consumption. The 16- 
bit data read cycle time for the ISA bus with one WS is shown in Figure 26-13. This is the 
standard read cycle time in the ISA bus. In other words, unlike memory, you cannot get 
zero wait state bus activity for 16-bit I/O operations. Therefore, ZEROWS has no effect 
on 16-bit I/O operations, and the standard 16-bit ISA bus cycle timing is completed in 3 
clocks. While you cannot shorten the 16-bit I/O bus cycle time, you can extend it by 
asserting the IOCHRDY pin. 


1/O bus bandwidth for ISA 
The I/O bus bandwidth is the rate of data transfer between the CPU and I/O 
devices and is dictated by the bus speed and the data bus width used. Example 26-2 shows 
the calculation of bus bandwidth for the ISA bus. 
Example 26-2 


Find the ISA bus bandwidth for (a) 8-bit standard, (b) 8-bit with ZEROWS asserted, and 
(c) 16-bit standard. 


Solution: 
Since the ISA bus speed is 8 MHz, we have 1/8 MHz = 125 nanoseconds for the bus clock. 


(a) The standard 8-bit I/O bus cycle for ISA uses 4 WS. Therefore, the bus cycle time is 6 clocks. 
Now cycle time = 6 x 125 ns = 750 and the bus bandwidth is 1/750 ns x 1 bytes = 1.33 
megabytes/second. 


(b) If ZEROWS is asserted in 8-bit I/O, we have 3 clocks for the I/O cycle time. Therefore, we 
have a cycle time of 3 x 125 = 375 ns and 1/375 ns x 1 byte = 2.66 megabytes/second for bus 
bandwidth. 


(c) For 16-bit I/O data transfers we must assert the IOCS16 pin. The I/O cycle time is 3 clocks. 
Therefore, we have a bus bandwidth of 1 / (3 x 125) x 2 = 5.33 megabytes/second. For 16-bit 
I/O, we cannot assert ZEROWS to shorten the cycle time. 


interfacing 8-bit peripherals to a 16-bit data bus 
As mentioned in the last chapter, microprocessors with a 16-bit data bus use odd 
and even byte spaces. This is done with the help of the AO and BHE pins as shown below: 


BHE AO 

0 0 Even-addressed words (uses DO—D15) 
0 l Odd-addressed byte (uses D8—D15) 

1 0 Even-addressed byte (uses DO—D7) 


In interfacing 16-bit I/O, the main issue is how to deal with odd- and even-address 
ports. The fact that data for even-addressed ports is carried on data bus DO—D7 and data 
for odd-addressed ports is carried on D8-D15 makes port design a challenging issue. 
There are two solutions to this problem. 


1. Simply use two separate PPI devices, such as the 8255. One is used for odd address- 


Oo sl 
CHAPTER 26: ISA, PC104, AND PCI BUSES 673 


es and the other for even addresses. For example, in a design using this method, if port 
74H is assigned to port A of the 8255, then port B has the address 76H, port C the 
address 78H, and so on. Another problem is outputting the contents of register AX in 
an instruction such as “OUT 76H,AX”. In this case, AL is carried to the 8255 with odd 
port addresses on D0-D7 and AH is carried to the other 8255 with even port address- 
es on D8-D15. This is extremely awkward and confusing for the programmer. Figure 
26-14 shows the 8255 with odd and even port addresses. 


2. The second solution is to connect all 8-bit peripheral ports to data bus DO—D7. This 
is exactly what IBM PC/AT designers, and indeed all makers of the x86 ISA bus, have 
done. In such a design, one problem must be solved. What happens when instructions 
such as “OUT 75H,AL” are executed? This is the odd-addressed port and the data is 
provided by the CPU on D8—D15, but the port is connected to DO—D7. To solve this 
problem, one must use a latch to grab the data from bus D8~D15 and provide it to 
D0-D7, where the port is connected. The latch responsible for this is called the Hi/Lo 
byte copier in ISA bus literature. In order for the Hi/Lo byte copier to work properly, 
it needs some logic circuitry. It is the function of the bus control logic circuitry to 
detect the following cases and activate the Hi/Lo byte copier. 


8255 
with even 
addresses 


8255 
with odd 
addresses 


Figure 26-14. Odd and Even Ports with the 8255 


Case 1: Outputting a byte to odd-addressed ports 


To write a byte to an odd-addressed port, the CPU provides the data on its upper 
data bus (D8—D15) and makes AO = 1 and BHE = 0 since the port address is an odd 
address. For example, in the instruction “OUT 41H,AL”, the contents of AL are provided 
to D8-D15 while BHE = 0 and A7—AO = 0100 0001. The bus control logic circuitry sens- 
es that the CPU is trying to send 8-bit data to an odd address through its D8—D15 data bus. 
It activates the Hi/Lo byte copier, which copies the data from D8—D15 to D0-D7. The data 
acted to the 8-bit peripheral device, which is connected to the lower data bus, 
ee 


674 


Case 2: Inputting a byte from odd-addressed ports 


___ To read a byte from an odd-addressed port, the CPU expects to receive the data 
on its upper data bus (D8-D15) and makes AO = 1 and BHE = 0. For example, in the 
instruction “IN AL,43”, A7-A0 = 0100 0011 and BHE = 0. The CPU expects the data to 
come in through D8—D15. The input port device is connected to D0-D7. The bus control 
logic circuitry senses that the CPU is trying to get 8 bits of data from a peripheral device 
through its D8—D15 data pins. The port is connected to DO—D7. It activates the Hi/Lo byte 
copier and copies the data from D0-D7 to D8—D15 and the data is presented to the CPU. 
The details of bus control logic circuits are quite involved and in today’s PCs are buried 
in the chipsets of x86 PCs. 


Limitations of the ISA bus 


In 1984 IBM extended the life of the PC/XT bus by adding an extra 36 pins. 
Although this made the AT bus a 16-bit bus, it did not solve some other problems associ- 
ated with the AT bus. In 1985 with the introduction of the 386 chip, a microprocessor with 
a 32-bit data bus, it was obvious that something had to be done about the limitations of 
the AT bus. The limitations of the ISA are as follows. 


1. The data path is limited to 16 bits; therefore, it is unable to accommodate the 32-bit 
data bus of the 386/486/Pentium microprocessors. 

2. The 24-bit address bus limits the maximum memory accessible through the expan- 
sion slot to 16M. Therefore, it is unable to accommodate the 32-bit address bus 
(4-gigabyte address space) of the 386/Pentium. 

3. In the ISA motherboard, there could be up to 8 ISA expansion slots. The expansion 
slot is bulky and has a large surface contact, resulting in a massive amount of capac- 
itance and inductance load on each signal. The accumulated capacitance and induc- 
tance associated with all the slots, plus the problem of crosstalk, limits the working 
frequency of the expansion slot of the ISA bus to 8 MHz. That means that the CPU 
can be 20, or 33, or even 500 MHz, but when it is communicating with the expan- 
sion slot it must slow down to 8 MHz. The absence of extra ground pins to reduce 
the effects of crosstalk and radio-frequency emissions makes the ISA bus irre- 
deemable for good. 

4. Since the interrupts (IRQs) are edge triggered, each can be assigned only to a single 
device and there cannot be any sharing of the interrupt between two or more devices. 
In high-frequency systems, the edge-triggered interrupt can also result in false acti- 
vation of the interrupt due to a spike or noise on the IRQ input. 

5. The PC/XT had three 8-bit channels (channels 1-3) for DMA as was shown in 
Chapter 15. Channel 0 was used for DRAM refreshing. The ISA released channel 0 
from the task of refreshing the DRAM and added three more DMA channels, all with 
16-bit data transfer capability. This made the ISA bus capable of handling a total of 
7 DMA channels, four 8-bit channels and three 16-bit channels. Another major prob- 
lem of DMA channels in ISA is the 16M address space limitation, due to the avail- 
ability of the AO-A23 address bus. This means that 386/486/Pentium machines, with 
their 4-gigabyte memory space, cannot be used for DMA bus activity to transfer data 
to memory space located beyond 16M. 


The combined effects of the above limitations means that the performance of a 
system with a powerful and fast microprocessor such as the 386/486/Pentium is limited 
by its expansion slot and system design. This fact led IBM and other PC makers to search 
for new solutions. While IBM decided to design a whole new bus standard, radically dif- 
ferent from the ISA bus, called IBM Micro Channel, other PC makers decided to go for a 
local bus or extending and improving the ISA bus, which they called the EISA bus. Micro 
Channel was not made an open architecture by IBM. The industry developed other, more 
powerful buses; consequently, Micro Channel never became popular and it was eventual- 
ly discontinued by IBM. The EISA bus was the PC-compatible makers' enhancement to 
the ISA bus. Although many of the limitations of the ISA bus discussed earlier were 
removed from the EISA bus, there remained one major one in that the EISA bus was lim- 
ited to 8 MHz, just like the ISA. This is due to the fact that the PC industry wanted to keep 
the EISA bus ISA-bus compatible, down to the smallest detail. The EISA bus was in real- 
ity an upgrade of ISA and consequently carried many of its limitations, but it had the 
A23-A31 address lines and D16-D31 data lines to accommodate the 386/486/Pentium. 


ee 
CHAPTER 26: ISA, PC104, AND PCI BUSES 675 


The low speed and other limitations of the EISA bus led the x86 PC makers to come up 
with a whole new bus. It is called PCI and we discuss it in the next section. 


PC104 bus and embedded PC 


The ISA bus has found a new life in the form of the PC104 bus. The name PC104 
comes from the fact that 6 additional pins are added to the ISA making it a 104-pin bus. 
Since ISA has 62 pins for the A & B sides and 36 pins for the C & D sides we have a total 
of 98 pins for the ISA. Of the 6 additional pins, 5 of them are ground and 1 is the key. See 
Table 26-3. Examining Table 26-3, we see that all the ISA signals are the same as for 
PC104. The only difference is the extra ground pins and the key pin. It must be noted that 
the voltage and current characteristics of PC104 are the same as for ISA signals. The 
PC104 form-factor is much smaller than the original ISA form-factor and is widely used 
in embedded systems due to lower power consumption. 


For more information on PC104 see http://www.pc104.org. 


Review Questions 


In the ISA bus, we use address to locate the I/O device. 
What is the maximum number of I/O devices the ISA bus supports? 
What is role of the ZEROWS pin for I/O cycle time? 

The IOCS16 pin is an (input, output) signal. 

To use the 16-bit I/O capability of the ISA bus we must assert pin 
What is the minimum I/O cycle time for the 8-bit ISA bus? 

What is the maximum bus bandwidth for the ISA bus? 


SECTION 26.3: PCI BUS 


eee a 


In this section we provide an overview of the PCI bus. But before we do that we 
need to explain some widely used terminology. 


Master and slave 


Many devices connected together communicate with each other through address, 
data, and control buses. When one device wishes to communicate with another, it sends 
an address to distinguish it from others since each device is assigned a unique address. It 
also sends a read or write signal to indicate its intention. The master device is the one that 
initiates and controls the communication while the responding device is called the slave. 
In x86-based PCs, the CPU is an example of a master and memory is an example of a 
slave. 


Bus arbitration 


There is only one set of global address, data, and control buses available in a 
given system. This means that requests by more than one master to use the buses must be 
arbitrated in an orderly fashion, since no bus can serve two masters at the same time. For 
a master to access the buses, it must ask permission from the central bus arbitrator and 
wait for a response before it proceeds. Depending on the system design, the central arbi- 
trator can assign access to each master according to a priority scheme or on a first-come- 
first-served basis. 


Bus protocol 


To coordinate activity among various parts of the system, buses must follow a 
strict set of timing and signal specifications. The term bus protocol refers to these speci- 
fications for a given bus. The two major bus protocols are synchronous and asynchronous. 
In synchronous protocol, bus activity is synchronized according to a central frequency, the 
e 


676 


Table 26-3: PC104 Signals 


J2/P2 J1/P1 


Row D RowC Row B RowA 
GND GND GND IOCHK* 

MEMCS16* SBHE* RESET SD7 
IOCS16* LA23 +5V SD6 
IRQ10 LA22 IRQ9 SD5 
IRQ11 ~LA21 -5V SD4 
IRQ12 LA20 DRQ2 SD3 
IRQ15 -12V SD2 
IRQ14 SRDY* SD1 
+12V SDO 

KEY IOCHRDY 
SMEMW* AEN 
SMEMR* SA19 
IOW* SA18 
IOR* SA17 
DACK3* SA16 
DRQ3 SA15 
DACK1* SA14 
MASTER* DRQ1 SA13 
GND REFRESH* SA12 
BCLK SA11 

IRQ7 


aL 
5 
v 
5 


a a oa 
—— a 


system frequency. In the x86 PC, the CPU accesses memory using synchronous protocol 
since memory cannot deviate from the timing specifications of the central clock oscilla- 
tor. Asynchronous protocol obeys its own timing in that it decides when it is ready and 
does not operate according to the central clock frequency. Printer interfacing in the x86 
PC is an example of asynchronous bus protocol. As discussed in Chapter 18, if the CPU 
is to send data to the printer, it must continuously monitor the printer's busy signal; only 
when the printer is not busy (ready) can it issue data to the printer's data bus. The CPU 
must also signal the availability of the data to the printer by the strobe signal and wait for 
its acknowledgment. In the asynchronous method of CPU-printer communication, the 
CPU is the master and the printer is the slave. The slave (printer) obeys its own timing for 


——— cca, 
CHAPTER 26: ISA, PC104, AND PCI BUSES 677 


the acknowledge signal, independent of the system frequency. However, in CPU-memory 
communication, memory timing specifications are according to the system frequency and 
the CPU does not poll memory to see if it is ready to accept data. Asynchronous protocol 
is used when there is a mismatch between the bus timing of the master and slave. 
Normally, the slave is slower than the master and has self-timing, whereas in synchronous 
protocol, the timing of the master and slave match. Synchronous protocol generally has a 
higher rate of data transfer than asynchronous protocol. 


Definition and merits of local bus 


Just as a high-performance car needs high-performance roads (no bumps, no 
speed limit) to explore its full potential, high-performance CPUs also require high-per- 
formance buses. While microprocessor performance is rapidly rising, buses are not keep- 
ing up. Many high-performance supercomputers and mainframes use their own propri- 
etary buses, but their limited use makes them nonstandard and consequently expensive. 
When 286 microprocessors of 10 to 16 MHz were used, many manufacturers resorted to 
proprietary buses to overcome the 8-MHz limitation associated with the ISA bus. This was 
especially the case where memory was concerned. In 80286/386 systems with 16- or 20- 
MHz speed, memory boards plugged into expansion slots could be accessed no faster than 
8 MHz. This fact led manufacturers such as Compaq to have their own memory expan- 
sion modules. In such systems, while ISA expansion slots are used for peripheral boards 
such as video, hard disk, or network cards, memory expansion was done by a specially 
designed slot on the motherboard used only for memory modules. These memory mod- 
ules work at the same speed as the CPU, or close to it. These systems were often adver- 
tised as dual-bus systems. One bus was for the ISA cards and the other one was for the 
memory modules. In the late 1990s with the widespread adaptation of SIP (single in-line 
pin), SIMM (single in-line memory module), and DIMM (dual in-line memory module), 
this problem was resolved. However, the lack of a bus standard for video and other adapter 
cards such as disk controllers forced PC board designers to come up with what is called a 
local bus. The idea of a local bus is to access the system buses at the same speed as the 
microprocessor, or close to it. In a 33-MHz microprocessor system with both ISA and 
local buses, the speed of the ISA bus signals is limited to 8 MHz, but the local bus signals 
are accessed at the same speed as the CPU, 33 MHz. In PC/XT systems of 4.7 MHz, the 
XT buses were accessed at the same speed as the 8088 microprocessor. The gap between 
CPU speed and expansion slot speed started to develop when the 80286 speed exceeded 
8 MHz. In those days, there were not many devices that needed speed beyond 8 MHz. 
This changed with the introduction of graphical user interface (GUI) software such as 
Microsoft Windows. In ISA bus systems, even the 16-bit video card plugged into the ISA 
expansion slot was not fast enough to keep up with the demand of the graphics software. 
This led some PC manufacturers to embed the video card into the motherboard and bypass 
the use of an ISA expansion slot for the video board. The problem with this option is that 
if the video section of the motherboard goes bad, one must either discard the motherboard 
or connect a video card to the expansion slot, depending on how the system board is 
designed. To solve the problem of slow video speed in ISA systems, some video board 
makers used a graphics processor to relieve the main CPU, the 386/486, from the burden 
of manipulation of graphic data stored in video RAM. In the absence of a graphic proces- 
sor on the video board, the main CPU is responsible for graphic data manipulation, which 
means that it must go through the slow ISA bus to access the data since the x86 CPU is 
connected to the video RAM through the ISA bus. The use of a specially designed proces- 
sor called a graphic processing unit (GPU) on the video board with the sole responsibili- 
ty of taking care of the calculation-intensive work of graphics provided a major improve- 
ment in the video system of the PC. However, it has one limitation. If the graphic data 
needs to be transferred from the disk to video RAM (or vice versa), it must still go through 
the slow ISA bus. Table 26-4 shows the bus bandwidth requirements for graphics and real- 
time video. Table 26-4 and Example 26-3 explain why the push for a high-speed bus was 
led by Intel: It wanted an improvement in bus performance lest their processors be buried 
under i. buses. The resulting bus standard is called PCI (peripheral component inter- 
connect). 


678 


Table 26-4: Bus Bandwidth Requirements for Graphics and Real-Time Video 


(updates/s) 
a a a o O O 
1028 x 1024 

|_Frame Rate (framesis) 
ae ae 


Bandwidth (bytes/s) 


Real-Time Video 


6 1 
4 1 
Frame Rate (frames/s 
As 
24 
1024 x 768 24 


Example 26-3 


Bandwidth (bytes/s 


26.3M 
67.5M 


Verify the bus bandwidth requirement for each of the following. 
(a) 1024 x 768 resolution, 16 colors, 10 redraw rate 
(b) 640 x 480 resolution, 24 colors, 30 frames per second 


Solution: 


(a) Bus bandwidth = 1024 x 768 x 16 x 10 = 125,829,120 bits/second = 15 megabytes/second 
(b) Bus bandwidth = 640 x 480 x 24 x 30 = 221,184,000 bits/second = 26.3 megabytes/second 


PCI bus 


High-performance microprocessors such as the Pentium require a high bus band- 
width to take advantage of their full potential. Therefore, it is not surprising that Intel 
became involved in defining a new bus standard. Although Intel came up with the speci- 
fications of the PCI local bus, this standard has become available free of charge to all PC 
and add-in board manufacturers. PCI was conceived as a specification standard for periph- 
eral connections for Intel's high-performance microprocessors such as the 80486 and 
Pentium. Later, with encouragement and input from the PC industry, it became a local bus 
standard with the pin-out for expansion slot connections. It has incorporated the follow- 
ing major characteristics: (a) burst mode data transfer, (b) level-triggered interrupts, (c) 
bus mastering, (d) automatic configuration, and (e) high bus bandwidth. More important, 
it has a bridge, which allows any kind of add-in card based on ISA to be plugged into the 
PCI local bus. PCI local bus characteristics are listed next. 


PCI local bus characteristics 


1. It has a maximum speed of 33 MHz. 

2. It has 32- and 64-bit data paths. 

3. It supports burst mode data transfer of 2-1-1-1 used by microprocessors such as the 
Pentium. 

4. It supports bus mastering, allowing the implementation of multiprocessors where any 
number of microprocessors can become master and take control of the buses. 

5. It is compatible with ISA. With implementation of a bus bridge, it supports the slow 
ISA bus as shown in Figure 26-15. Buffers in the bridge allow the microprocessor to 
write into the buffer and go about its own business, leaving the task of handling the 
slow ISA to the bridge. 

6. The PCI local bus is processor independent. It can be used with any microprocessor, 
not just Intel x86. For this reason, all companies have also supported the PCI and 


ee ener ee eee LL ——_<«< 


CHAPTER 26: ISA, PC104, AND PCI BUSES 679 


Memory 
Controller 


PCI Bus 
Controller 


Expansion Bus 
SCSI Il 32-bit data path at 33 MHz 
Controller 
Controller : 
(ISA/EISA) 
16-bit data path 
at 8 MHz 


MULTIMEDIA 


FAST LAN | LGRAPHICS 


Figure 26-15. PCI Local Bus Architecture 
(Reprinted by permission of Intel Corporation, Copyright Intel, 1993) 


used it with their non-x86 microprocessors. See Figure 26-16. 

7. It supports both 5- and 3.3-V expansion cards, allowing smooth transition from 5- to 
3.3-V systems. The placing of small cutouts (keys) prevents users from plugging a 
card with one voltage into a motherboard with a different voltage. 

8. It provides autoconfiguration capability, where a user can install a new add-in card 
without setting DIP switches, jumpers, and selecting the interrupt. Configuration 
software automatically selects an unused address and interrupt to resolve conflicts. 

9. It has a ground or VCC pin between every two signals to reduce crosstalk and radio- 
frequency emissions. See Figure 26-17. 

10. It implements level-triggered interrupts, which support interrupt sharing. 

11. It supports up to 10 peripherals. Some of the peripherals must be embedded into the 
motherboard. 

12. The maximum number of expansion slots working at 33 MHz varies, depending on 
the 5-V or the 3.3-V implementation. The increase in the number of expansion slots 
beyond 5 means a speed lower than 33 MHz. The use of a highly refined connector 
with a small area of contact makes the PCI bus a high-frequency bus. 


Plug-and-play feature 


The PCI is equipped with the autoconfiguration feature but at the same time it has 
a slot for the ISA bus, in which autoconfiguration is not supported. How can this work? 
This lack of autoconfiguration is a major headache for computer users and network man- 
agers. This led Microsoft and Intel to work together to equip the ISA bus with the auto- 
configuration feature. This feature is often referred to as plug and play. The PCI autocon- 
figuration feature can work completely only after the ISA cards and BIOS are equipped 
with autoconfiguration (plug and play). Plug and play falls into the following three cate- 
gories. 
O O 


680 


Required Pins Optional Pins 
AD{[31::00 AD{63::32] 
Address 
and Data C/BE[3::0]# C/BE[7::4]# 64-Bit 
PAR PARGA Extension 
REQ64# 
FRAME# PCI ACK64# 
Interface TRDY# COMPLIANT 
Control IRDY# DEVICE LOCI Interface Control 
STOP# 
DEVSEL# INTAR 
IDSEL INTB# 
INTC# Interrupts 
INTD# 
Error PERR# SBO# 
Reporting 5 SERR# SDONE# zi Cache 
Support 
Arbitration 
(masters only) B 
JTAG 
System [ 


Figure 26-16. PCI Pin List 
(Reprinted by permission of PCI Special Interest Group, Copyright 1992, 1993) 


1. Neither the motherboard BIOS nor the add-in card is equipped with the plug-and- 
play feature. This is sometimes called "plug and pray." You may get it to work by trial 
and error. 

2. The motherboard BIOS is equipped with plug and play, but the add-in card is not. In 
this case, setup software will help you to assign the I/O addresses, IRQs, and DMA 
channels. 

3. Both the motherboard BIOS and the add-in card are equipped for plug and play. In 
this case, autoconfiguration will take care of everything. It will assign I/O addresses, 
IRQs, and DMA channels without any user involvement. 


PCI connector 


A few points must be noted about the PCI connector. First, notice in Figure 26-17 
that very few PCI signals match the signals of x86 microprocessors. The reason is that 
PCI is a mezzanine bus, meaning that the PCI controller sits between the CPU and the 
external bus connection. In this way, any CPU can be used with the PCI bus. 
Standardizing the bus connection frees the CPU buses from any restriction. This is in con- 
trast to some other buses, in which signals come directly from the x86 pins and have the 
same name. When new signals were added to the Pentium, the bus had to be upgraded. 
PCI solves this problem by being microprocessor independent. 

Another point to be noted is the multiplexing of address and data on the PCI bus, 
since the same pins are used for address and data. In the first clock, the address is provid- 
ed, and in the second clock, the data is provided. Therefore, the PCI bus has a cycle time 
of 2 clocks in nonburst mode, just like the Pentium. For burst mode, in the first clock the 
address is provided and in each subsequent clock a word (32-bit) of data is provided. 

Another point to be noted is the 64-bit extension for the PCI bus. The PCI bus can 
be implemented for a 32-bit data bus or a 64-bit data bus. The 32-bit sections end at pin 


ee errr nee 
CHAPTER 26: ISA, PC104, AND PCI BUSES 681 


5V Environment 3.3V Environment 


Side B Side A {Side B Side A 
CONNECTOR KEY |Ground Ground 
CONNECTOR KEY | Ground Ground 
AD[08] C/BE[0]#} AD[08] C/BE[0]# 
AD[07] FOTON AD[07] +3.3V 

#3.5V AD[06] | +3.3V AD[06] 
AD[05] AD[04] | AD[05] AD[04] 
AD[03] Ground | AD[03] Ground 
Ground AD[02] | Ground AD/[02] 

1 AD[0Q], | AD[01 AD[00 

Swi) ah is) fee tO) aaa) 
ACK64# REQ64#| ACK64# REQ64# 
+5V +5V +5V +5V 

+5V EN +5V  . 5V 

CONNECTOR KEY | CONNECTOR KEY 
CONNECTOR KEY | CONNECTOR KEY 
Reserved Ground |Reserved Ground 
Ground C/BE[7]#| Ground C/BE[7]# 
C/BE[6]# CBP C/BE[6]# cieni 
cBejaj +5v(O)| cgepap3.3 WO) 
Ground PAR64 | Ground PAR64 
AD[63] AD[62] | AD[63] AD[62] 
— Ground AD (64) Ground 
+5v(/O) apfeo} | +3.34/9) apjeo] 
AD[59] AD[58] 
AD[57] Ground 
Ground AD[56] 
AD[55] AD[54 
AD[53] +3.3W" 
Ground AD{52] 
AD[51] AD{[50] 
eP, Ground 
To AD[48] 
AD[47] AD[46] 
AD[45] Ground 
Ground AD[44] 
AD[43] ADIRI 
AD[41] +3.3K/0) 
Ground AD[40] 
AD[39] AD[38] 
ADDI Ground 
+3.30) AD{[36] 
AD[35] AD[34] 
AD[33] Ground 
Ground AD[32] 
Reserved Reserved 
Reserved Ground 
Ground Reserved 


3.3V Environment 


Side B Side A |pin 
-12V TRST# 
TCK +12V 
Ground TMS 
TDO TDI 
+5V +5V 
+5V INTA# 
INTB# INTC# | INTB# INTC# 
INTD# +5V INTD# +5V 


PRSNT1#Rese PRSNT 1# Rese 
Reserved a) Reserved ule) 


[ [5V Environment | 
pin | Side B Side A 
AAV TRST# 

TCK +12V 
Ground TMS 
TDO TDI 
+5V +5V 
+5V INTA# 


Ground Ground | CONNECTOR KEY | 61 
Ground Ground | CONNECTOR KEY | 62 


Ground GNT# 
REQ Ground | 65 
+3.3U//0) Reserved] 66 
AD[31] AD[30] 
AD[29] HOV 
Ground AD[28] 
AD{27] AD[26] 
AD[25] Ground 
FSV AD[24] 
C/BE[3}# IDSEL 
AD[23] +320V 
Ground AD[22] 
AD[21] AD[20] 
AD[19] Ground 
#3. 3V AD[18] 
AD{17] AD[16] 
C/BE[2]}# +3.3V 
Ground FRAME# 
IRDY# Ground 
+3.3V TRDY# 
DEVSEL# Ground 
Ground STOP# 
LOCK# +3.3V 
PERR# SDONE 
+3.3V  SBO# 
SERR# Ground 
ESV PAR 
C/BE[1]# AD[15] 
AD[14] +3.3V 
Ground AD[13] 
AD[12] AD[11] 
AD[10] Ground| AD[10] Ground 
Ground _AD[09 Ground AD[09] 


Figure 26-17. Pinout of the PCI Connector 
(Reprinted by permission of PCI Special Interest Group, Copyright 1992, 1993) 


Ground GNT# 
REQ# Ground 
+5\V(I/0) Reserved 
AD[31] AD[30] 
AD[29] +3.3V 
Ground AD[28] 
AD[27] AD[26] 
AD[25} Ground 
+3/3V AD[24] 
C/BE[3]}# IDSEL 
AD[23] +o, 
Ground AD[22] 
AD[21] AD[20] 
AD[19] Ground 
+3.3V AD[18] 
AD[17] AD[16] 
C/BE[2]# +3.3V 
Ground FRAME# 
IRDY# Ground 
TSV TRDY# 
DEVSEL# Ground 
Ground STOP# 
LOCK# +3.3V 
PERR# SDONE 
3V SBO# 
SERR# Ground 
ESTN PAR 
C/BE[1]# AD[15] 
AD[14] +3.3V 
Ground AD[13] 
AD[12] AD[11] 


AD[59] AD[58] 
AD[57] Ground 
Ground AD[56] 
AD[55] AD[54 
AD[53] +5y(/O) 
Ground AD[52] 
AD[51] AD[50] 
AD[ Ground 
Pollo) AD[48] 
AD[47] AD[46] 
AD[45] Ground 
Ground AD[44] 
AD[43] gue 
apj41] +5V O) 
Ground AD[40] 
AD[39] AD[38] 
ADDIT Ground 
+5V(/O) ADI36] 
AD[35] AD[34] 
AD[33] Ground 
Ground AD[32] 
Reserved Reserved 
Reserved Ground 
Ground Reserved 


— 


SS a ee ee N 


682 


62. The pinout is shown in Figure 26-17. Pins 63 through 94 are used for 64-bit 
data/address extension only. 
l Notice also in Figure 26-17 that every third pin is dedicated to ground or VCC. 
iio —_— the crosstalk problem and allows the bus to be used for frequencies up to 
z 


Figure 26-18 illustrates the PCI board connectors. 


5V Board 3.3V Board 


1/O buffers powered on 5 V rail i/O buffers powered on 3.3 V rail 


Figure 26-18. PCI Board Connectors 
(Reprinted by permission of PCI Special Interest Group, Copyright 1992, 1993) 


PCI performance 


The PCI local bus supports both single memory cycle and burst mode. In the sin- 
gle cycle, it takes 2 clocks to read or write a word of data. In the first clock, the address 
is provided and in each subsequent clock, the data is accessed. This makes it 2-1-1-1-1- 
l.... Example 26-4 calculates the bus bandwidth for the PCI. 

Table 26-5 provides the performance comparison of all the buses for non-burst 
mode data transfer. Table 26-6 shows the performance of buses and ports. 


Example 26-4 
Calculate the bus bandwidth of PCI for (a) single and (b) burst transfer, both on a 32-bit data path. 


Solution: 


PCI can work up to a maximum of 33 MHz. The clock period is 30 ns. 

(a) For the single transfer, each transfer takes 2 clocks or a total of 60 ns to transfer 4 bytes (32 bits) 
of data. Therefore, bus bandwidth = (1/60 ns) x 4 bytes = 66.6 megabytes/second 

(b) In burst mode, ignoring the overhead of the first clock for the address, it takes 1 clock or 30 ns 
to transfer 32-bit data. Therefore, bus bandwidth = (1/30 ns) x 4 bytes = 133 megabytes/second. 


Table 26-5: ISA, EISA, and PCI Local Bus Bandwidth Comparison 


Note: In the bus bandwidth calculation, 2 clocks per memory cycle are assumed . 


CHAPTER 26: ISA, PC104, AND PCI BUSES 683 


Table 26-6: Data Transfer Rate for Buses and Ports 
Maximum Bus Bandwidth 


Data Path bit 
8M bytes/second 
4 
1 
1 


3 
6 
m E —___§ 
ur 8 500 bytes /secon 
compet Lt lookorsseeona 


Note: In the bus bandwidth calculation , 2 clocks per memory cycle are assumed . 


Review Questions 


What is the local bus? 

The memory expansion connections to 16-MHz CPUs are through the (ISA 
bus, local bus). 

What is a dual bus system? 

Which needs a local bus, the modem or the hard disk controller? 

True or false. PCI is a 32- and a 64-bit bus. 

How has PCI reduced the effects of crosstalk (EMI) for high-frequency systems? 


NO — 


OPUS age 


PCI Express (PCle) is becoming the standard for 64-bit systems. 


Use Google to research the PCle bus. 


PROBLEMS 


SECTION 26.1: ISA BUS MEMORY SIGNALS 


— 


Which of the control signals are used for ISA memory interfacing if the address of 

memory is in the range of F0000-FFFFFH? 

The ISA bus can access a maximum of bytes. Why? 

The MEMCS16 is an active- signal. Is this an input signal? 

Explain the use of the MEMCS16 pin. 

The ZEROWS is an active- signal. Is this an input signal? 

Explain the use of the ZEROWS pin. 

If the MEMCS 16 pin is high, what portion of the data bus is being used? 

If the MEMCS16 pin is low, what portion of the data bus is being used? 

If the ZEROWS pin is high, give the memory cycle time for an 8-bit data transfer. 

0. If the ZEROWS pin is high, give the memory cycle time for a 16-bit data transfer. 

1. If the ZEROWS pin is asserted low, give the memory cycle time for a 16-bit data 
transfer. 

12. If the ZEROWS pin is asserted low, give the memory cycle time for an 8-bit data 

transfer. 
13. To achieve the best data transfer rate for ISA bus memory interfacing, what should be 
the status of the MEMCS16 and ZEROWS pins? 


eS Se a ae 


684 


14. 


Fill the blanks for following cases. 


MEMCS16 ZEROWS Data bus used Read Cycle time Bus Bandwidth 


I5: 


— i O 
— Oe 


Why do we use DIMM and SIMM sockets for memory expansion instead of the ISA 
bus slot? 


SECTION 26.2: I/O BUS TIMING IN ISA BUS 


. Explain the role of ZEROWS in I/O timing. 

. Explain the role of IOCS16 in 16-bit I/O. 

. In the ISA bus, the default mode for I/O operation is (8-bit, 16-bit). 
. What is the clock speed for the ISA bus? 

. How many WS are used in the 8-bit standard I/O cycle? 

. In Problem 20, how much does it take to complete one I/O read cycle? 

. How many WS are used in the 8-bit I/O cycle when ZEROWS is asserted? 

. In Problem 21, how much does it take to complete one I/O read cycle? 

. How many WS are used in the 16-bit standard I/O cycle? 

. In Problem 24, how much does it take to complete one I/O read cycle? 


. The IOCS16 pin is an (input, output) and an active- (low, high) signal. 
. The ZEROWS pin is an (input, output) and an active- (low, high) sig- 
nal. 


. Calculate the bus bandwidth for 8-bit standard I/O of the ISA bus. 

. Calculate the bus bandwidth for 16-bit standard I/O of the ISA bus. 
. What is the function of the CHANRDY pin? 

. Explain how we can extend the I/O cycle time of the ISA bus. 


SECTION 26.3: PCI BUS 


D2: 
99. 
34. 
oD. 
36: 


37: 


The PCI has a maximum speed of MHz. 

True or false. The PCI bus can accommodate 64-bit data buses of the Pentium. 
True or false. The PCI bus supports autoconfiguration. 

True or false. Interrupt sharing is not allowed in PCI. 

Calculate and compare the maximum bus bandwidth for the 32-bit PCI. Assume that 
it is non-burst mode. 

Calculate the maximum bus bandwidth for the following. Assume burst mode. 

(a) 32-bit PCI (b) 64-bit PCI 


ANSWERS TO REVIEW QUESTIONS 


SECTION 26.1: ISA BUS MEMORY SIGNALS 


SA beat 


Low 

D0-D7 

Low 

6 clocks since 2 +4 WS = 6 
8 megabytes /sec. 


SECTION 26.2: I/O BUS TIMING IN ISA BUS 


i 


A0—A9 


2. With ten address lines, AO—A9, we get 1024 I/O devices. 


—— a a, 


CHAPTER 26: ISA, PC104, AND PCI BUSES 685 


ee 


By asserting it, we tell the system board to shorten the I/O bus cycle time. 

Input 

IOCS16 

3 clock cycles (only 1 WS) if we assert ZEROWS. That gives us 3 x 125 ns = 37) Us: 
The maximum bus bandwidth is achieved with the 16-bit data bus and it is 1/(3 x 125 
ns) x 2 = 5.33 megabytes per second. 


SECTION 26.3: PCI BUS 


ky 


DA 


686 


It is the bus that is closely attached to the CPU and works at the same frequency as 
the CPU (or close to it). 

Local bus 

A system that has both a PCI bus and a DIMM memory bus. 

Hard disk controller 

True 

By placing a ground or Vcc pin between every two signal lines 


CHAPTER 27 


USB PORT PROGRAMMING 


OBJECTIVES 


Upon completion of this chapter, you will be able to: 


>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 


Compare USB performance with PCI 

Compare USB performance with LPT and serial COM ports 
Understand the difference between the host and peripheral devices 
Define the terms upstream and downstream 

Describe the role of hubs in USB port expansion 

Understand the difference between A-type and B-type connectors 
Describe the USB cable signals 

Describe the differences between bus and self-powered hubs in USB 
Understand the current needs of peripheral devices and hosts 
Program USB devices in the Windows operating system using C# 


687 


This chapter deals with the basics of USB ports and how we program them in the 
x86 PC using the C language. In Section 27.1, we examine some general features of the 
USB port and compare its performance with the PCI bus and COM port. Section 27.2 
deals with USB port expansion, power management, and cables. USB programming using 
Microsoft's C# is discussed in Section 27.3. 


SECTION 27.1: USB PORTS: AN OVERVIEW 


USB stands for universal serial bus. Next to PCI, it is one of the most important 
additions to the PC system in recent years. In this section we provide an overview of the 
USB bus. To see the need for the USB bus, we first review the limitations and benefits of 
ISA and PCI buses, as well as serial and parallel ports. ISA and PCI buses provide a high 
rate of data transfer between the CPU and the outside world. This is because they have a 
wide data path, high-frequency bus speed, and communicate directly with the CPU. As 
mentioned in previous chapters, the ISA bus is a 16-bit bus and has a speed of 8 MHz. For 
the PCI bus, the data path is 64 bits wide and has a maximum speed of 64 MHz. The fol- 
lowing are some of the major limitations of the ISA and PCI buses: š 


l. Both ISA and PCI are inside the PC; therefore, to access th . you need to open the 
PC's case and plug the card into an expansion slot. 

2. The PCI and ISA expansion slots take too much physica! space on the motherboard. 
This limits the number of expansion slots that are available on a given motherboard. 

3. Both ISA and PCI buses require too much power. For every ISA ex ansion slot, an 
extra 25 watts must be incorporated into the PC's power supply ~very PCI slot 
requires an extra 10 watts. As a result, a motherboard with 2 ISA ar u 3 PCI expan- 
sion slots has burdened the PC power supply with an additional 8( watts of power 
(2 x 25 + 3 x 10 = 80). This can make a significant difference in handheld and 
laptop systems, and that is the reason that neither of these devices has any expansion 
slots on its boards. 


The most important disadvantage of serial and parallel ports is the limit of 4of 
each in a given motherboard. The PC BIOS limits the number of serial (COM) and paral- 
lel (LPT) ports to 4. Of course, this is a theoretical limitation imposed by the BIOS. 
Practically speaking, there are a limited number of IRQs to assign to all of these LPT and 
COM ports. This is the reason that there are no more than 2 COM ports and 1 LPT port 
on motherboards. There are other limitations associated with the COM and LPT ports that 
come to light only when compared with the major features of the USB port. 


Major features of USB 


Here are some of the most important features of the USB. They demonstrate why 
this is one of the most important additions to PC architecture in recent years. 


1. A single USB port can accommodate up to 127 devices such as a mouse, scanner, 
printer, modem, and so on. The devices are daisy chained together with the help of 
external hubs. The devices are recognized automatically by the PC. Many devices 
such as printers and monitors can be equipped with a hub, thereby saving the addi- 
tional expense of buying a separate hub. More importantly, daisy chaining the 
devices via hubs requires no opening of the PC case when connecting additional 
devices to a PC. 

2. The data transfer rates for USB 1.1 are 1.5 and 12 megabits per second (Mb/s). The 
USB 2.0 raised the maximum limit to 480 Mbs. 

3. USB is hot-pluggable. This means that new devices can be connected to the USB port 
without first turning off the PC. Remember that this was not the case with ISA and 
PCI. In all these devices, the PC had to be turned off prior to installing the device and 
configuring the system. The hot-pluggability of USB is one of its most important fea- 
tures. 

4. USB does not have to burden the system's power supply. Unlike ISA and PCI, con- 
necting additional USB devices to a PC does not require an exorbitant amount of 


M 


688 


power from the PC power supply. Each new device requires no more than a maxi- 
mum of 500 mW from the PC power supply. More importantly, the USB hubs sitting 
outside the PC can have their own power supply, thereby relieving the motherboard's 
power supply of the burden of providing power to every device. For example, a 
printer or monitor that is equipped with a USB hub can provide power to all external 
USB devices. The USB is also equipped with power managing capability. This 
allows a device that is not being used for a period of time to be powered down into 
sleep mode. Starting with Windows 98, USB device management is part of the oper- 
ating system; the same is true for Windows NT/2000, XP, and Vista. 


Bus comparison 


Table 27-1 shows the comparison of ISA, PCI, COM, LPT, and USB. Notice in 
the calculation of the data transfer rate (bus bandwidth) for ISA and PCI buses, that a 2- 
clock read and write cycle is assumed. For the LPT port, the 2-microsecond timing for the 
parallel port is assumed. Both Microsoft and Intel have worked hard to eliminate ISA bus, 
LPT, and COM ports from PC motherboards. In many of today’s desktop PCs we have 
PCI and USB ports only. 


Table 27-1: USB Bus Performance Com 1 arison a 


Bus/Port Data Path (bits) Maximum Bus Bandwidth 


ISA (8 MHz 16 8M bytes/second 
PCI (33 MHz 32 66M bytes/second 
PCI (33 MHz) 64 133M bytes/second 
PCI (66 MHz) 64 266M bytes/second 
USB 1.0 1 1.5M bits/second 
USB 1.1 1 12M bits/second 
USB 2.0 1 480M bits/second 
LPT 8 500K bytes/second 
COM 1 56K bits/second 


Note: In the bus bandwidth calculation, 2 clocks per memory cycle are assumed. 


Review Questions 


1. True or false. Both PCI and USB are hot-pluggable. 

2. What advantage do USB devices have over other devices in terms of system power 
requirements? 

3. True or false. Each USB device must have its own USB connection to the system via 
an expansion slot. 


SECTION 27.2: USB PORT EXPANSION AND POWER MAN- 
AGEMENT 


Host, peripherals, and hubs 


In discussing USB, we clarify some widely used terminology. In the USB, the 
host is the master and the peripheral I/O device is the s/ave, meaning that in any given 
system there is only one host (master), which controls many peripheral I/O devices 
(slaves). The data going toward the host is called upstream and data flowing from the host 
toward peripherals is referred to as downstream. Many x86 IBM PC motherboards come 
with a single USB host that can be connected to peripheral I/O devices. Since a mother- 
board has only one host and it needs to connect to many peripheral I/O devices, we need 
what is called a hub. In motherboards with two or more USB ports, the hub is incorporat- 
ed into the host. This hub on the x86 motherboard's host is commonly referred to as the 
root hub. See Figure 27-1. The root hub on an x86 motherboard has anywhere from two 
to six ports, where each port can be connected to a peripheral I/O device or another hub. 


——— aT 
CHAPTER 27: USB PORT PROGRAMMING 689 


If we need to attach more peripheral I/O devices, then we use an external hub to expand 
the number of USB ports. A USB hub has one upstream port and several downstream 
ports and we must use a special USB cable to connect the external hubs together. The spe- 
cial cable used for connecting the hubs together has A-type and B-type connectors. The 
A-type connector is connected to one of the USB ports while the B-type connector is con- 
nected to the upstream port of the hub. Examining the USB cable with both A-type and B- 
type connectors, you will notice that they are shaped differently. See Figure 27-2. This 
should prevent any mixing of upstream and downstream connections. The vast majority 
of the USB cables you see in everyday use are USB extension cables. The USB extension 
cable has A-type connectors on both ends. It is used for connecting a peripheral I/O device 
such as a Flash memory stick to the USB port of the x86 PC. Such a cable has a male 
(plug) A-type connector on one end and a female (receptacle) A-type connector on the 
other end. See Figure 27-2. 


Host 


Root Hub 


Type-A Connector 


Downstream Upstream 


Type-B Connector 
Upstream Port 


HUB 


Downstream 
Ports 


Figure 27-1. Host and Root Hub in x86 PC 


Expanding USB ports with external hubs 


As we discussed earlier, we can use an external hub to connect many peripheral 
I/O devices to a single host. There are two types of external hubs: bus-powered and self- 
powered. In the bus-powered hub, the power source comes from the root hub of the host 
PC via a USB cable. The self-powered hub has its own power source, and that is the rea- 
son it is also referred to as as an externally powered hub. 


690 


Pin 1: VBUS 
Pin 2: D+ 
Pin 3: D- 
Pin 4: GND 


Type-B Connector (Plug) 


Type-A Connector (Plug) 


Figure 27-2. USB Connectors 


Bus-powered hubs 


The vast majority of peripheral devices that we use in everyday life, such as Flash 
memory sticks, are bus-powered, which means that the USB cable provides power. In the 
x86 PC, the root hub provides power to all USB peripheral devices connected to many of 
its USB ports. These self-powered devices use USB cables to get power from the x86 PC's 
power supply. Since the maximum current usage by a downstream peripheral device is 
100 mA, a root hub with 6 USB ports uses no more than 600 (6 x 100 mA = 600 mA) cur- 
rent from the PC’s power supply. By connecting the bus-powered external hubs, we can 
increase the number of USB ports. In that case, the host can provide a maximum of 500 
mA of current (and 5V power source) per A-connector. Of the 500 mA provided to the 
bus-powered external hub, it uses 100 mA for its own internal circuitry. That leaves only 
400 mA to be given to the downstream peripheral ports. Because each downstream periph- 
eral port must be given 100 mA, the number of downstream peripheral ports in the bus- 
powered hub is limited to 4 (4 x 100 mA + 100 mA = 500 mA). 


Self-powered hubs 


In the self-powered external hub, we need a power source other than the mother- 
board. This generally comes from the wall outlet. Notice that the self-powered hub is also 
referred to as an externally powered hub. In the case of the self-powered external hub, 
each peripheral port is provided 500 mA. That means that an external self-powered hub 
with four downstream ports uses at least 2000 mA (500 mA x 4 = 2000 mA) in addition 
to the power it needs for its own internal circuitry. There are some hubs with more than 4 
USB connectors. They are self-powered with their own power source, since they need 
more than 2000 mA current. Printers with a USB port often have a self-powered hub, 
drawing from the printer's power supply. The devices with I/O functions and the hub are 
often called compound devices in the USB literature. See Figure 27-3. 


e 
CHAPTER 27: USB PORT PROGRAMMING 691 


Host 
Root HUB 


Bus-Powered 
HUB 


Self-Powered 
H U B AC Power Source 


Figure 27-3. Bus-Powered and Self-Powered Hubs 


Daisy chaining the hubs 


The USB bus allows up to five levels of external hubs to be connected to the root 
hub. Of course, due to the limited power provided by the root hub, many of these hubs 
must be self-powered. Notice that USB documentation terminology uses the word tier 
instead of level. It also states that the total number of tiers (levels) cannot be more than 
seven. This means the number of levels from the farthest peripheral device to the host 
(root hub) must be no more than seven. The host (root hub) is counted as the first tier 
(level), and the number of external hubs that can be cascaded together is limited to five. 
This makes the peripheral device connected to the last hub tier number seven. See Figure 
27-4. 


USB cable signals 


The USB cable has four wires inside it. They are Vcc, Gnd, D+, and D-. While 
Vcc and Gnd provide the power source, D+ and D- are for the data path. See 
Figure 27-2. The D+ and D- wires provide the data path between the host and the periph- 
eral device using the half-duplex method of data transfer. That means that at any given 
time the data either is going from the device to the host or coming from the host to the 
device, but never at the same time. This is in contrast to the RS232 serial port, which has 
full-duplex capability since it has both TxD (transmit data) and RxD (receive data) pins. 
Notice that just like the RS232, the USB cable does not have any wire for clock to syn- 
chronize data transfer. That is the reason that USB is called an asynchronous bus. The 
USB port uses the D+ and D- pins to implement the asynchronous method of data trans- 
fer. It uses NRZI (none return to zero inverted) encoding. The maximum cable length for 
the USB is 5 meters (15 feet). In high-speed data transfers, the maximum length needs to 
be less than 5 meters. 


692 


System 
(x86 PC) 


TIER 2 


DEVICE 


DEVICE 


Figure 27-4. Daisy Chaining Hubs 
USB enumeration 


The USB devices are hot-pluggable, meaning that the moment we attach a USB 
peripheral device to the USB port of the x86 PC, it is recognized by the host. That means 
that there is no need to turn off the PC and then turn it back on in order for the device to 
be recognized. When we connect a peripheral device to a USB port, the host senses the 
voltage change on the data line via the cable. The host enquires about this USB device 
and after receiving a satisfactory answer, the peripheral device is assigned an address. The 
address value has seven bits, which can take values from 0 to 127 (0-7F in hex, or 
0000000—1111111 in binary). However, the host assigns addresses between 1 and 127, 
since the address 0 is the default address of any peripheral device before it is recognized 
by the host. This process of recognizing a device and assigning it an address is called enu- 
meration in the USB literature. As long as this peripheral device is connected to the host 
and the host is powered, this unique address value is not changed. If we disconnect the 
peripheral device from the host, the host will take back the address and keep it in the pool 


aa, 


CHAPTER 27: USB PORT PROGRAMMING 693 


of 127 addresses available for future assignments. 
Review Questions 


True or false. The USB is hot-pluggable 

True or false. Each USB peripheral device must have its own power source. 
True or false. Data going from the host to the peripheral is called upstream. 
What is the maximum current provided by the host to each peripheral? 
What is the difference between the A side and the B side of a USB cable? 


SECTION 27.3: USB PORT PROGRAMMING 


oe NA 


In this section we discuss how to program USB devices in the Windows operat- 
ing system. We use C# .NET software, which is available for free from Microsoft’s web 
site. 

In programming the peripherals connected to the x86 PC USB host, the following 
points must be remembered: 


1. A single USB host can connect with up to 127 peripheral devices. 

2. Although the x86 PC can have multiple USB hosts, we are assuming that there is 
only one host in the x86 PC. ’ 

3. In the Windows OS, opening a connection to a peripheral device is like opening a 
file. 

4. The API provided with the driver handles where the data will go to or come from. 


Communicating via USB-Bluetooth 


As an example of programming a USB port using C#, we can use a USB-to- 
Bluetooth device te communicate between two PCs. In such cases, each PC will have a 
USB-Bluetooth device connected to its USB port. See Figure 27-5. Before we show the 
programming of the USB port, we can see the communications between two PCs via 
Bluetooth using HyperTerminal. 


System 
(x86 PC) 


USB Port USB Port 


Wireless 
USB- Communication USB- 
Bluetooth Between Two Bluetooth Bluetooth 


: Devic : 
Device i Device 


Figure 27-5. PC Communications via USB-Bluetooth Devices 


_ Assuming each PC has a USB-Bluetooth device connected to its USB port, the 
ee steps show how to use HyperTerminal to establish communication between 
them: 

1. Use the Bluetooth utility provided with the Bluetooth device to open a virtual COM 
port on the PC. 


Oe 
694 


CHAPTER 27: USB PORT PROGRAMMING 


2. Configure HyperTerminal to communicate on the virtual COM port assigned by the 
Bluetooth utility. 

3. Perform the same tasks on the second PC. 

4. Send messages between the two PCs to verify communications. 


__ In Program 27-1 we write the message “Hello World” to the USB-Bluetooth 
device. In Program 27-2 we receive the message from the USB-Bluetooth device. 


//Program 27-1 : Send data through a USB COM port. 
//Must be compiled in Visual C# 2005 Express, which is 
//availabe for free from Microsoft’ s website. 


using System; 
using System.IO.Ports; 


namespace SerialComm { 
class SerialOut { 
static void Main () 
{ 
// The following line will set the COM port parameters in C#. 
SerialPort eon = new SerialPort( “COM12", 9600, Parity.None, 
dy IECISILES (Oia WF 
coml.Open (); // Open virtual COM port 12. 
do 
{ 
// Send the data through the COM port. 
CoOnmevwiemeolwne ( = Hello Worlal rt J; 
} 
while (!Console.KeyAvailable) ; 
Gal. Cleese he 9/7 Close virtual COM port 12. 
} 
} 
} 


//Program 27-2 : Getting data through a USB COM port. 


using System; 
Usang oys tem. TOTEONESI 


namespace SerialComm { 
class Serialin { 
SESE Le WAI! Merna {(()) 
{ 
// The following line will set the COM port parameters in CH. 
SerialPort coml = new SerialPort( "COM12", 9600, Parity.None, 
8, StopBits- One ) > 
coml.Open (); // Open virtual COM port 12. 
do 


{ 
// Send the data through the COM port. 


Console.WriteLine (coml.ReadLine()); 
} 
while (!Console.KeyAvailable) ; 
coml.Close (); // Close virtual COM port 12. 


695 


By running the above two programs, the message is transferred via the USB port 


to Bluetooth, and from Bluetooth wirelessly to another PC. Programs 27-1 and 27-2 are 
similar to the programs we wrote for serial communication in Chapter 17. 


Comparing Figure 27-6 with the null modem connection discussed in Chapter 17, 


one might wonder, is there any way to communicate between two PCs via a null USB 
cable? The answer is absolutely not, since two USB hosts cannot talk to each other direct- 
ly. In other words, the USB host can only talk to a USB device and that is the reason we 
are using two USB devices to establish communication between two hosts. 


Review Questions 


1. True or false. A USB host can talk to another USB host. 

2. True or false. A USB device can talk to another USB device on the same USB bus. 

3. True or false. A USB device can talk to a USB host. 

4. True or false. In the Windows OS, opening a connection to a USB device is like open- 
ing a file. 

5. Generally, how many hosts exist in a single x86 PC? 

PROBLEMS 


SECTION 27.1: USB PORTS: AN OVERVIEW 


Give the maximum number of USB, LPT, and COM ports that the BIOS of the x86 
PC can support. 

Give the power allocation from the motherboard for PCI, ISA, and USB. 

Give the maximum data transfer rate for the USB specifications 1.0, 1.1, and 2.0. 
What does it mean if a device is hot pluggable? 

Of the PCI, ISA, LPT, COM, and USB, indicate which one is hot pluggable. 


SECTION 27.2: USB PORT EXPANSION AND POWER MANAGEMENT 


eo 


True or false. The x86 PC with a USB port comes with the host capability. 

True or false. The x86 PC with multiple USB ports comes with a root hub. 

True or false. The x86 PC with a root hub provides sufficient current to all the periph- 
eral devices connected to it. 

True or false. The x86 PC with a USB port needs an external power supply for any 
peripheral device connected to it. 


. True or false. Every hub must have one upstream port. 

. Define downstream and upstream as used in USB terminology. 

. Define host, peripheral device, and hub as used in USB terminology. 
. How much current does a host provide to a peripheral device? 

. How much current does a hub provide to a peripheral device? 


In daisy-chaining USB, what is the maximum number of layers (tiers) we can have? 


. What is the difference between self-powered and bus-powered hubs? 


Where do we use the USB cable with an A-type connector on both ends? 

Where do we use the USB cable with an A-type connector on one end and a B-type 
connector on the other end? 

How many pins does the USB cable have? 


. Give the function of each pin of the USB cable. 

. What is the maximum length of a USB cable? 

. What method does USB use to represent Os and 1s? 
. True or false. The USB is half-duplex. 


What is enumeration in USB? 


. Who performs the enumeration? 


What is the maximum current provided by the host to a bus-powered hub? 


. What is the difference between the A-side and B-side of a USB cable? 
. What is the maximum number of USB peripheral devices that a single host can enu- 


merate? 


SSeS SSS 


29. What is the range of numbers used for USB enumeration? 

30. True or false. The USB host changes the enumeration number periodically to make 
sure the peripheral device is alive and responding. 

31. True or false. Data going from a peripheral device to the host is called upstream. 

32. What is the maximum current usage by a downstream peripheral device? 

SECTION 27.3: USB PORT PROGRAMMING 


33. Combine Programs 27-1 and 27-2 into a single program and test it. 


ANSWERS TO REVIEW QUESTIONS 


SECTION 27.1: USB PORTS: AN OVERVIEW 


l. False 
2. Devices connected together via a USB hub can share a power supply. 
3.  Palse 


SECTION 27.2: USB PORT EXPANSION AND POWER MANAGEMENT 


]. ) Ttte 

2. False 

3. False 

4. 100mA 

5. The B side is connected to the upstream port on a hub while the A side is for down- 


stream ports. They are also shaped differently to prevent confusion. 


SECTION 27.3: USB PORT PROGRAMMING 


1. False 
2. False 
3. mue 
4. True 
Sf r 


EE 


CHAPTER 27: USB PORT PROGRAMMING 697 


698 


APPENDIX A 


DEBUG 
PROGRAMMING 


OVERVIEW 


DEBUG is a program included in the MS-DOS operating sys- 
tems that allows the programmer to monitor a program's execution 
closely for debugging purposes. Specifically, it can be used to examine 
and alter the contents of memory, to enter and run programs, and to stop 
programs at certain points in order to check or even change data. This 
appendix provides a tutorial introduction to the DEBUG program. You 
will learn how to enter and exit DEBUG how to enter, run, and debug 
programs, how to examine and alter the contents of registers and mem- 
ory, plus some additional features of DEBUG that prove useful in pro- 
gram development. Numerous examples of Assembly language program- 
ming in DEBUG are given throughout and the appendix closes with a 
quick reference summary of the DEBUG commands. 


699 


First, a word should be said about the examples in this appendix. Within exam- 
ples, what you should type in will be represented in plain text caps: 


PLAIN TEXT REPRESENTS WHAT THE USER TYPES IN 
and the response of the DEBUG program will be in bold caps: 
BOLD CAPS REPRESENT THE COMPUTER'S RESPONSE 


The examples in this appendix assume that the DEBUG program is in drive A and 
that your programs are on drive B. If your system is set up differently, you will need to 
keep this in mind when typing in drive specifications (such as "B:"). It is strongly sug- 
gested that you type in the examples in DEBUG and try them for yourself. The best way 
to learn is by doing! 


SECTION A.1: ENTERING AND EXITING DEBUG 


To enter the DEBUG program, simply type its name at the command prompt: 


C:\>DEBUG <return> 


"DEBUG" may be typed in either uppercase or lowercase. Again let us note that 
this example assumes that the DEBUG program is on the disk in drive A. After "DEBUG" 
and the carriage return (or enter key) is typed in, the DEBUG prompt "-" will appear on 
the following line. DEBUG is now waiting for you to type in a command. 

Now that you know how to enter DEBUG, you are ready to learn the DEBUG 
commands. The first command to learn is the quit command, to exit DEBUG. 

The quit command, Q, may be typed in either uppercase or lowercase. This is 
true for all DEBUG commands. After the Q and the carriage return have been entered, 
DEBUG will return you to the command prompt. This is shown in Example A-1. 


Example A-1 : Entering and Exiting DEBUG 


C:\>DEBUG <return> 


Z0 <iciaicuhene 
CEN> 


SECTION A.2: EXAMINING AND ALTERING REGISTERS 


The register command allows you to examine and/or alter the contents of the 
internal registers of the CPU. The R command has the following syntax: 


R <register name > 


The R command will display all registers unless the optional <register name> 
field is entered, in which case only the register named will be displayed. 


Example A-2 : Using the R Command to Display All Registers 


C:\>DEBUG <return> 

-R 

AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 
DS=15EF ES=15EF SS=15EF CS=15EF IP=0100 NV UP EI PL NZ NA PO NC 
15EF:0100 OAE4 OR AH,AH 


700 


After the R and the carriage return are typed in, DEBUG responds with three lines 
of information. The first line displays the general-purpose, pointer, and index registers' 
contents. The second line displays the segment registers' contents, the instruction pointer's 
current value, and the flag register bits. The codes at the end of line two, "NV UP DI... 
NC", indicate the status of eight of the bits of the flag register. The flag register and its 
representation in DEBUG are discussed in Section A.6. The third line shows some infor- 
mation useful when you are programming in DEBUG. It shows the instruction pointed at 
by CS:IP. The third line on your system will vary from what is shown above. For the pur- 
pose at hand, concentrate on the first two lines. The explanation of the third line will be 
postponed until later in this appendix. 

When you enter DEBUG initially, the general-purpose registers are set to zero and 
the flag bits are all reset. The contents of the segment registers will vary depending on 
the system you are using, but all segment registers will have the same value, which is 
decided by the operating system. For instance, notice in Example A-2 above that all seg- 
ment registers contain 0C44H. It is strongly recommended not to change the contents of 
the segment registers since these values have been set by the operating system. Note: In 
a later section of this appendix we show how to load an Assembly language program into 
DEBUG. In that case the segment registers are set according to the program parameters 
and registers BX and CX will contain the size of the program in bytes. 

If the optional register name field is specified in the R command, DEBUG will 
display the contents of that register and give you an opportunity to change its value. This 
is seen next in Example A-3. 


Example A-3 : Using the R Command to Display/Modify Registers 


(a) Modifying the contents of a-register 
ra Onn 
cx 0000 
anche 
=R OK 
CX FFFF 


DEBUG pads values on the left with zero 

-R AX 

Ax 0000 

Bal 

-R AX 

Ax 0001 

all 

-R AX 

AX 0021 

oat 

-R AX 

AX 0321 

ae aa 

-R AX 

AX 4321 

ec. VA 

^ Error 

(c) Entering data into the upper byte 

=e. \Dial 

BR Error 

-R DX 

DX 0000 

:4C00 


ne LES 


APPENDIX A: DEBUG PROGRAMMING 701 


Part (a) of Example A-3 first showed the R command followed by register name 
CX. DEBUG then displayed the contents of CX, which were 0000, and then displayed a 
colon ":". At this point a new value was typed in, and DEBUG prompted for another com- 
mand with the "-" prompt. The next command verified that CX was indeed altered as 
requested. This time a carriage return was entered at the ":" prompt so that the value of 
CX was not changed. ig ' 

Part (b) of Example A-3 showed that if fewer than four digits are typed in, 
DEBUG will pad on the left with zeros. Part (c) showed that you cannot access the upper 
and lower bytes separately with the R command. If you type in any digit other than 0 
through F (such as in "2FOG"), DEBUG will display an error message and the register 
value will remain unchanged. 

See Section A.6 for a discussion of how to use the R command to change the con- 
tents of the flag register. 


SECTION A.3: CODING AND RUNNING PROGRAMS IN DEBUG 


In the next few topics we explore how to enter simple Assembly language instruc- 
tions, and assemble and run them. The purpose of this section is to familiarize the reader 
with using DEBUG, not to explain the Assembly language instructions found in the exam- 
ples. 


A, the assemble command 


The assemble command is used to enter Assembly language instructions into 
memory. 


A <starting address> 


The starting address may be given as an offset number, in which case it is assumed 
to be an offset into the code segment, or the segment register can be specified explicitly. 
In other words, "A 100" and "A CS:100" will achieve the same results. When this com- 
mand is entered at the command prompt "-", DEBUG will begin prompting you to enter 
Assembly language instructions. After an instruction is typed in and followed by <return>, 
DEBUG will prompt for the next instruction. This process is repeated until you type a 
<return> at the address prompt, at which time DEBUG will return you to the command 
prompt level. This is shown in part (a) of Example A-4. 

Before you type in the commands of Example A-4, be aware that one important 
difference between DEBUG programming and Assembly language programming is that 
DEBUG assumes that all numbers are in hex, whereas most assemblers assume that num- 
bers are in decimal unless they are followed by "H". Therefore, the Assembly language 
instruction examples in this section do not have "H" after the numbers as they would if an 
assembler were to be used. For example, you might enter an instruction such as "MOV 
AL,3F". In an Assembly language program written for MASM, for example, this would 
have been typed as "MOV AL,3FH". 

As you type the instructions, DEBUG converts them to machine code. If you type 
an instruction incorrectly such that DEBUG cannot assemble it, DEBUG will give you an 
error message and prompt you to try again. Again, keep in mind that the value for the 
code segment may be different on your machine when you run Example A-4. Notice that 
each time DEBUG prompts for the next instruction, the offset has been updated to the next 
available location. For example, after you typed the first instruction at offset 0100, 
DEBUG converted this to machine language, stored it in bytes 0100 to 0102, and prompt- 
ed you for the next instruction, which will be stored at offset 0103. Note: Do not assem- 
ble beginning at an offset lower than 100. The first 100H (256) bytes are reserved by the 
operating system and should not be used by your programs. This is the reason that exam- 
ples in this book use "A 100" to start assembling instructions after the first 100H bytes. 


U, the unassemble command: looking at machine code 


___ The unassemble command displays the machine code in memory along with its 
equivalent Assembly language instructions. The command can be given in either format 
ÁT 
702 


Example A-4 : Assemble, Unassemble, and Go Commands 


(a) Assemble command 

-A 100 

103D:0100 MOV AXx,1 

103D:0103 MOV BX,2 

103D:0106 MOV CX,3 

103D:0109 ADD Ax, Bx 

103D:010B ADD AX,CX 

103D:010D INT 3 

103D:010E 
(b) Unassemble command 

-U 100 10D 

103D:0100 B80100 

103D:0103 BB0200 

103D:0106 B90300 

103D:0109 01D8 

103D:010B 01C8 

103D: 010D CC 
(c) Go command 

TR 
AX=0000 BX=0000 CX=0000 DX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=0100 NV UP DI PL NZ NA PO NC 
103D:0100 B80100 MOV AX,0001 

-G 
AX=0006 BX=0002 CX=0003 DX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=010D NV UP DI PL NZ NA PE NC 
103D:010D CC INT 3 


shown below. 


U <starting address > <ending address> 
U <starting address > < L number of bytes> 


Whereas the assemble instruction takes Assembly language instructions from the 
keyboard and converts them to machine code, which it stores in memory, the unassemble 
instruction does the opposite. Unassemble takes machine code stored in memory and con- 
verts it back to Assembly language instructions to be displayed on the monitor. Look at 
part (b) of Example A-4. The unassemble command was used to unassemble the code that 
was entered in part (a) with the assemble command. Notice that both the machine code 
and Assembly instructions are displayed. The command can be entered either with start- 
ing and ending addresses, as was shown in Example A-4: "U 100 10D", or with a starting 
address and a number of bytes in hex. The same command in the second format would 
be "U 100 LD", which tells DEBUG to start unassembling at CS:100 for D bytes. If the 
U command is entered with no addresses after it: "U <return>", then DEBUG will display 
32 bytes beginning at CS:IP. Successively entering "U <return>" commands will cause 
DEBUG to display consecutive bytes of the program, 32 bytes at a time. This is an easy 
way to look through a large program. 


G, the go command 


The go command instructs DEBUG to execute the instructions found between the 
two given addresses. Its format is 


G < = starting address> <stop address (es)> 


nn ELLE EES 


APPENDIX A: DEBUG PROGRAMMING 703 


If no addresses are given, DEBUG begins executing instructions at CS:IP until a 
breakpoint is reached. This was done in part (c) of Example A-4. Before the instructions 
were executed, the R command was used to check the values of the registers. Since CS:IP 
pointed to the first instruction, the G command was entered, which caused execution of 
instructions up until "INT 3", which terminated execution. After a breakpoint is reached, 
DEBUG displays the register contents and returns you to the command prompt "-". Up to 
10 stop addresses can be entered. DEBUG will stop execution at the first of these break- 
points that it reaches. This can be useful for programs that could take several different 
paths. 

At this point the third line of the register dump has become useful. The purpose 
of the third line is to show the location, machine code, and Assembly code of the next 
instruction to be executed. In Example A-5, look at the last line in the register dump given 
after the G command. Notice, at the leftmost part of line three, the value CS:IP. The val- 
ues for CS and IP match those given in lines one and two. After CS:IP is the machine 
code, and after the machine code is the Assembly language instruction. 

Part (a) of Example A-5 is the same as part (c) of Example A-4. The go command 
started at CS:IP and executed instructions until it reached instruction "INT 3". Part (b) 
gave a starting address but no ending address; therefore, DEBUG executed instructions 
from offset 100 until "INT 3" was reached. This could also have been typed in as "G 
=CS:100". Part (c) gave both starting and ending addresses. We can see from the register 


Example A-5 : Various Forms of the Go Command 


The program is first assembled: 

-A 100 

103D:0100 MOV AX,1 

103D:0103 MOV BX,2 

103D:0106 MOV CX,3 

103D:0109 ADD AX, BX 

103D:010B ADD AX,CX 

103D:010D INT 3 

103D:010E 
(a) Go command in form "G" 

-G 
AX=0006 BxX=0002 CxX=0003 DxX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=010D NV UP DI PL NZ NA PE NC 
103D:010D CC INT 3 
(b) Go command in form "G = start address" 

-G =100 
AX=0006 BxX=0002 Cx=0003 DxX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=010D NV UP DI PL NZ NA PE NC 
103D:010D CC INT 3 


(ce) Go command form "G = Start address endingaddress" 

-G =100 109 
AX=0001 BxX=0002 CX=0003 DX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=0109 NV UP DI PL NZ NA PE NC 
103D:0109 01D8 ADD AX , BX 


(d) Go command format YG address" 

ae 

IP 0109 

OOO 

=61 109 
AX=0001 Bx=0002 CX=0003 DX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=0109 NV UP DI PL NZ NA PE NC 
103D:0109 01D8 ADD AX,BX 


results that it did execute from offset 100 to 109. Part (d) gave only the ending address. 
When the start address is not given explicitly, DEBUG uses the value in register IP. Be 


sure to check that value with the register command before issuing the go command with- 
out a start address. 


T, the trace command: a powerful debugging tool 


The trace command allows you to trace through the execution of your programs 


one or more instructions at a time to verify the effect of the programs on registers and/or 
data. 


T <= starting address> <number of instructions> 


This tells DEBUG to begin executing instructions at the starting address. 
DEBUG will execute however many instructions have been requested in the second field. 
The default value is 1 if no second field is given. The trace command functions similarly 
to the go command in that if no starting address is specified, it starts at CS:IP. The differ- 
ence between this command and the go command is that trace will display the register 
contents after each instruction, whereas the go command does not display them until after 
termination of the program. Another difference is that the last field of the go command is 
the stop address, whereas the last field of the trace command is the number of instructions 
to execute. 

Example A-6 shows a trace of the instructions entered in part (a) of Example A- 
4. Notice the way that register IP is updated after each instruction to point to the next 
instruction. The third line of the register display shows the instruction pointed at by IP, 
that is, the next instruction to be executed. Tracing through a program allows you to exam- 
ine what is happening in each instruction of the program. Notice the value of AX after 
each instruction in Example A-6. 

The same trace as shown in Example A-6 could have been achieved with the com- 
mand "-T 5", assuming that IP = 0100. Experiment with the various forms of the trace 
command. "T" with no starting or count fields will execute one instruction starting at 
CS:IP. If no first field is given, CS:IP is assumed. If no second field is given, 1 is 
assumed. 

If you trace a large number of instructions, they may scroll upward off the screen 
faster than you can read them. <Ctrl-num lock > can be used to stop the scrolling tem- 
porarily. To resume the scrolling, enter any key. This works not only on the trace com- 
mand, but for any command that displays information to the screen. 


Example A-6 : Trace Command 


-T=100 5 

AX=0001 BxX=0000 Cx=0000 Dx=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=0103 NV UP DI PL NZ NA PO NC 
103D:0103 BBO200 MOV BX ,0002 


AX=0001 BX=0002 Cx=0000 Dx=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=0106 NV UP DI PL NZ NA PO NC 
103D:0106 B90200 MOV Cx ,0003 


AX=0001 BxX=0002 Cx=0003 Dx=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=0109 NV UP DI PL NZ NA PO NC 
103D:0109 01D8 ADD AX, BX 


AX=0003 BX=0002 CxX=0003 DX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=010B NV UP DI PL NZ WA PE NC 
103D:010B 01C8 ADD AX,CX 


AX=0006 BX=0002 CX=0003 DX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=010D NV UP DI PL NZ NA PE NC 
103D:010D CC 3 


L LMM 


APPENDIX A: DEBUG PROGRAMMING 705 


Loading the 8-bit and 16-bit registers is shown in Example A-7. 
Example A-7 : Moving Data into 8- and 16-bit Registers 


C:>DEBUG 

SR 

AX=0000 BX=0000 CX=0000 DX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=0100 NV UP DI PL NZ NA PO NC 
103D:0100 B664 MOV DH, 64 

-A 100 

103D:0100 MOV AL, 3F 

103D:0102 MOV BH,04 

103D:0104 MOV CX, FFFF 

103D:0107 MOV CL,BH 

103D:0109 MOV CX,1 

103D:010C INT 3 

103D:010D 

-T =100 5 

AX=003F BX=0000 CX=0000 DX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=0102 NV UP DI PL NZ NA PO NC 
103D:0102 B704 MOV BH ,04 q 


AX=003F BX=0400 CX=0000 DX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=0104 NV UP DI PL NZ NA PO NC 
103D:0104 B9FFFF MOV CX,FFFF 


AX=003F BX=0400 CX=FFFF DX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=0107 NV UP DI PL NZ NA PO NC 
103D:0107 88F9 MOV CL,BH 


AX=003F BX=0400 CX=FF04 DX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=0109 NV UP DI PL NZ NA PO NC 
103D:0109 B90100 MOV Cx,0001 


AX=003F BX=0400 CX=0001 DX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=010C NV UP DI PL NZ NA PO NC 
103D:010C CC INT 3 


Example A-8 is stored starting at CS:IP of 1132:0100. This logical address cor- 
responds to physical address 11420 (11320 + 0100). 


SECTION A.4: DATA MANIPULATION IN DEBUG 


Next are described three DEBUG commands that are used to examine or alter the 
contents of memory. 


F the fill command fills a block of memory with data 
D the dump command displays contents of memory to the screen 
E the enter command examines/alters the contents of memory 


F, the fill command: filling memory with data 


The fill command is used to fill an area of memory with a data item. The syntax 
of the F command is as follows: 


F <starting address > <ending address> <data> 
F <sterting address > <L number of Byres> <daca> 


ee 
706 


Example A-8 : Assembling and Unassembling a Program 


C:>DEBUG 

-R 

AX=0000 BX=0000 CX=0000 Dx=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=1132 ES=1132 SS=1132 CS=1132 IP=0100 NV UP DI PL NZ NA PO NC 
1132:0100 BED548 MOV SI,48D5 
a LOA 

1132:0100 MOV AL, 57 

1132:0102 MOV DH, 86 

1132:0104 MOV DL,72 

1132:0106 MOV CX, DX 

1132:0108 MOV BH,AL 

1132:010A MOV BL, 9F 

1132:010C MOV AH, 20 

1132:0108 ADD AX, DX 

1132:0110 ADD CX, BX 

1132:0112 ADD AX,1F35 

1132:0115 

Sop 100 M2 

1132:0100 B057 MOV AL,57 

1132:0102 B686 MOV DH,86 

1132:0104 B272 MOV DL,72 

1132:0106 89D1 MOV CX,DX 

1132:0108 88C7 MOV BH,AL 

1132:010A B39F MOV BL,9F 

1132:010C B420 MOV AH,20 

1132:010E 01D0 ADD AX,DX 

1132:0110 01D9 ADD CX,BX 

1132:0112 05351F ADD AX,1F35 


This command is useful for filling a block of memory with data, for example, to 
initialize an area of memory with zeros. Normally, you will want to use this command to 
fill areas of the data segment, in which case the starting and ending addresses would be 
offset addresses into the data segment. To fill another segment, the register should precede 
the offset. For example, the first command below would fill 16 bytes, from DS:100 to 
DS:10F with FF. The second command would fill a 256-byte block of the code segment, 
from CS:100 to CS:1FF with ASCII 20 (space). 


E10 108) BE 
T CS: 100 ine 20 


Example A-9 demonstrates the use of the F command. The data can be a series 
of items, in which case DEBUG will fill the area of memory with that pattern of data, 
repeating the pattern over and over. For example: 


E 100 T20 00 EE 


The command above would cause 20 hex bytes (32 decimal) starting at DS:100 
to be filled alternately with 00 and FF. 


D, the dump command: examining the contents of memory 


The dump command is used to examine the contents of memory. The syntax of 
the D command is as follows: 


D <start address > <end address> 
D <start address > <L number of bytes> 


a Eee 
APPENDIX A: DEBUG PROGRAMMING 707 


Example A-9 : Filling and Dumping a Block of Memory 


(a) Fill and dump commands 

C:>DEBUG 

-F 100 14F 20 

an L507 1900 

-D 100 19F 
103D:0100 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20 
103D:0110 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20 
103D:0120 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20 
103D:0130 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20 
103D:0140 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20 
103D:0150 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 
103D:0160 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 
103D:0170 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 
103D:0180 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 
103D:0190 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 
(b) Filling and dumping selected memory locations 

-F 104 10A FF 

-D 104 10A 
103D:0104 FF FF FF FF-FF FF FF 

=) 100 10F 
103D:0100 20 20 20 20 FF FF FF FF-FF FF FF 20 20 20 20 20 


(c) Filling and dumping code segment memory 

Sy CSa iko vie 20 

=D (So OKO) wee 
103D:0100 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20 
103D:0110 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20 
103D:0120 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20 


The D command can be entered with a starting and ending address, in which case 
it will display all the bytes between those locations. It can also be entered with a starting 
address and a number of bytes (in hex), in which case it will display from the starting 
address for that number of bytes. If the address is an offset, DS is assumed. The D com- 
mand can also be entered by itself, in which case DEBUG will display 128 consecutive 
bytes beginning at DS:100. The next time "D" is entered by itself, DEBUG will display 
128 bytes beginning at wherever the last display command left off. In this way, one can 
easily look through a large area of memory, 128 bytes at a time. 

Example A-10 demonstrates the use of the fill and dump commands. Part (a) 
shows two fill commands to fill areas of the data segment, which are then dumped. Part 
(b) was included to show that small areas of memory can be filled and dumped. Part (c) 
shows how to fill and dump to memory from other segments. Keep in mind that the val- 
ues for DS and CS may be different on your machine. 


E, the enter command: entering data into memory 


The fill command was used to fill a block with the same data item. The enter com- 
mand can be used to enter a list of data into a certain portion of memory. The syntax of 
the E command is as follows: 


E <address> <data list> 
E <address> 


~ Part (a) of Example A-10 showed the simplest use of the E command, entering the 
starting address, followed by the data. That example showed how to enter ASCII data, 
which can be enclosed in either single or double quotes. The E command has another 
powerful feature: the ability to examine and alter memory byte by byte. If the E com- 
ÁC 
708 


Example A-10 : Using the E Command to Enter Data into Memory 


(a) Entering data with the E command 
mow ‘John Smith" 
=D LOOO 
103D:0100 4A 6F 68 6E 20 53 6E 69-74 68 20 20 20 20 20 20 John Snith 


(b) Altering data with the E command 
-E 106 
103D:0106 6E.6D 
-D 100 10F 
103D:0100 4A 6F 68 6E 20 53 6D 69-74 68 20 20 20 20 20 20 John Smith 


(c) Another way to alter data with the E command, hitting the 
space bar to go through the data a byte at a time 
ih OO 
103D:0100 4A. 6F. 68. 6E. 20. 53. 6E.6D 
ID) ALtONO) ALO) 
103D:0100 4A 6F 68 EE 20 53 6D 69-74 68 20 20 20 20 20 20 John Smith 


(d) Another way to alter data with the E command 
=F 107 
103D:0107 69.- 
103D:0106 6E.6D 


mand is entered with a specific address and no data list, DEBUG assumes that you wish 
to examine that byte of memory and possibly alter it. After that byte is displayed, you 
have four options: 


1. You can enter a new data item for that byte. DEBUG will replace the old contents 
with the new value you typed in. 

2. You can press <return>, which indicates that you do not wish to change the value. 

3. You can press the space bar, which will leave the displayed byte unchanged but will 
display the next byte and give you a chance to change that if you wish. 

4. You can enter a minus sign, "-", which will leave the displayed byte unchanged but 
will display the previous byte and give you a chance to change it. 


Look at part (b) in Example A-10. The user wants to change "Snith" to "Smith". 
After the user typed in "E 106", DEBUG responded with the contents of that byte, 6E, 
which is ASCII for n, and prompted with a ".". Then the user typed in the ASCII code for 
"m", 6D, entered a carriage return, and then dumped the data to see if the correction was 
made. Part (c) of Example A-10 showed another way to make the same correction. The 
user started at memory offset 100 and pressed the space bar continuously until the desired 
location was reached. Then he made the correction and pressed carriage return. 

Finally, part (d) showed a third way the same correction could have been made. 
In this example, the user accidentally entered the wrong address. The address was one 
byte past the one that needed correction. The user entered a minus sign, which caused 
DEBUG to display the previous byte on the next line. Then the correction was made to 
that byte. Try these examples yourself since the E command will prove very useful in 
debugging your future programs. 

The E command can be used to enter numerical data as well: 


E 100 23 BA 02 47 
Example A-11 gives an example of entering code with the assemble command, 


entering data with the enter command, and running the program. 


a 
APPENDIX A: DEBUG PROGRAMMING 709 


Example A-11 : Entering Data and Code and Running a Program 


C:>DEBUG 
=F LOO 


103D:0100 MOV AL,00 
103D:0102 ADD AL,[ 0200] 
103D:0106 ADD AL,[ 0201] 
103D:010A ADD AL,[ 0202] 
103D:010E ADD AL,[ 0203] 
103D:0112 ADD AL,[ 0204] 
103D:0116 INT 3 


POSH O17 

=H DemOZ00) 25 12 EIH 2B 

SD) IDSA 0200) evAlors 

103D:0200 25 12 15 1F 2B 02 00 E8-51 FF C3 E8 1E F6 74 03 %...+..hQ.Ch.vt. 
-G =100 116 

AX=0096 BX=0000 Cx=0000 Dx=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
CS=103D ES=103D SS=103D CS=103D IP=0116 OV UP DI NG NZ AC PE NC 
FOSD: 0116" CC INT 3 


SECTION A.5: EXAMINING/ALTERING THE FLAG REGIS- 
TER IN DEBUG 


The discussion of how to use the R command to examine/alter the contents of the 
flag register was postponed until this section, so that program examples that affect the flag 
bits could be included. Table A-1, on the following page, gives the codes for 8 bits of the 
flag register, which are displayed whenever a G, T, or R DEBUG command is given. 


Table A-1: Codes for the Flag Register 


[Flag O | Code When Set (= 1 Code When Reset (= 0 
OF overflow flag OV (overflow 


DI (disable interrupt 
PL (plus, or positive 


If all the bits of the flag register were reset to zero, as is the case when DEBUG 
is first entered, the following would be displayed for the flag register: 


NV UP DI PL NZ NA PO NC 
Similarly, if all the flag bits were set to 1, the following would be seen: 
OV DN EI NG ZR AC PE CY 


__ Example A-12 shows how to use the R command to change the setting of the flag 
register. 

Example A-12 showed how the flag register can be examined, or examined and 
then altered. When the R command is followed by "F", this tells DEBUG to display the 
contents of the flag register. After DEBUG displays the flag register codes, it prompts with 


re 
710 


another "-" at the end of the line of register codes. At this point, flag register codes may 
be typed in to alter the flag register, or a simple carriage return may be typed in if no 
changes are needed. The register codes may be typed in any order. 


Example A-12 : Changing the Flag Register Contents 


SIR 
NV UP DI PL NZ NA PO NC -DN OV NG 


SR IE 
OV DN DI NG NZ NA PO NC - 


Impact of instructions on the flag bits 


Example A-13, on the following page, shows the effect of ADD instructions on 
the flag register. The ADD in part (a) involves byte addition. Adding 9C and 64 results in 
00 with a carry out. The flag bits indicate that this was the result. Notice the zero flag is 
now ZR, indicating that the result is zero. In addition, the carry flag was set, indicating the 
carry out. The ADD in part (b) involves word addition. Notice that the sign flag was set 
to NG after the ADD instruction was executed. This is because the result, CAEO, in its 
binary form will have a | in bit 15, the sign bit. Since we are dealing with unsigned addi- 
tion, we interpret this number to be positive CAEOH, not a negative number. This points 
out the fact that the microprocessor treats all data the same. It is up to the programmer to 
interpret the meaning of the data. Finally, look at the ADD in part (c). Adding AAAAH 
and 5556H gives 10000H, which results in BX = 0000 with a carry out. The zero flag 
indicates the zero result (BX = 0000), while the carry flag indicates that a carry out 
occurred. 

Examples A-14 and A-15 show how to code a simple program in DEBUG, set up 
the desired data, and execute the program. This program includes a conditional jump that 
will decide whether to jump based on the value of the zero flag. This example also points 
out some important differences between coding a program in DEBUG and coding a pro- 
gram for an assembler such as MASM. First notice the JNZ instruction. If this were an 
Assembly language program, the instruction might be "JNZ LOOP_ADD", where the 
label LOOP_ADD refers to a line of code. In DEBUG we simply JNZ to the address. 
Another important difference is that an Assembly language program would have separate 
data and code segments. In Example A-14, the test data was entered at offset 0200, and 
consequently, BX was set to 0200 since it is being used as a pointer to the data. In an 
Assembly language program, the data would have been set up in the data segment and the 
instruction might have been "MOV BX,OFFSET DATA1" where DATA1 is the label asso- 
ciated with the data directive that stored the data. 


LK 
APPENDIX A: DEBUG PROGRAMMING 711 


Example A-13 : Observing Changes in the Flag Register 


C:>DEBUG 

-A 100 

103D:0100 MOV AL, 9C 

103D:0102 MOV DH, 64 

103D:0104 ADD AL, DH 

103D:0106 INT 3 

103D:0107 

-T 3 
AX=009C BX=0000 CxX=0000 DX=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=0102 NV UP DI PL NZ NA PO NC 
103D:0102 B664 MOV DH, 64 


AX=009C BxX=0000 Cx=0000 Dx=6400 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=0104 NV UP DI PL NZ NA PO NC 
103D:0104 OOFO ADD AL,DH 


AX=0000 Bx=0000 Cx=0000 Dx=6400 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=0106 NV UP DI PL ZR AC PE CY 
HOSP; 0106 «CC INT 3 


(b) 

-A 100 

103D:0100 MOV AX,34F5 

103D:0103 ADD AX, 95EB 

103D:0166 INT 3 

103D:0107 

—T =100 2 
AX=34F5 BX=0000 Cx=0000 Dx=6400 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=0103 NV UP DI PL NZ NA FO NC 
103D:0103 O5EB95 ADD AX , 95EB 


AX=CAEO BX=0000 CX=0000 DX=6400 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=0106 NV UP DI NG NZ AC PO NC 
103D:0106 CcC INT 3 
(c) 

-A 100 

103D:0100 MOV BX,AAAA 

103D:0103 ADD Bx,5556 

103D:0107 INT 3 

103D:0108 

-=G =100 107 
AX=34F5 BX=0000 Cx=0000 DxX=6400 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=0107 NV UP DI PL ZR AC PE CY 
103D:0107 CC INT 3 


712 


Example A-14 : Tracing through a Program to Add 5 Bytes 


C:>DEBUG 

-A 100 

103D:0100 MOV Cx,05 

103D:0103 MOV Bx,0200 

103D:0106 MOV AL,O 

103D:0108 ADD AL,|[ BX] 

103D:010A INC BX 

103D:010B DEC CX 

103D:010C JNZ 0108 

103D:010E MOV [ 0205] , AL 

TOSD: 0L TNT 3 

103D:0112 

-m 0200 25 12 15 1F 2B 

-D 0200 O20F 

103D:0200 25 12 15 1F 2B 9A DE CE-1E F3 20 20 20 20 20 20 %...+.n. 
-G =100 111 

AX=0096 BX=0205 CxX=0000 Dx=0000 SP=CFDE BP=0000 SI=0000 DI=0000 
DS=103D ES=103D SS=103D CS=103D IP=0111 NV UP DI PL ZR NA PE NC 
1023D:0111 CC INT 3 

-D 0200 020F 

103D:0200 25 12 15 1F 2B 96 DE CE-1E F3 20 20 20 20 20 20 %... 


Example A-15 : Data Transfer Program in DEBUG 


C:>DEBUG 

-A 100 

iG3p:-0100 MOV SI,0210 

103D:0103 MOV DI,0228 

103D:0106 MOV CX, 6 

103D:0109 MOV AL,[ SI] 

103D:010B MoV [ DI] ,AL 

103D:010D INC SI 

103D:010E INC DI 

103D:010F DEC CX 

103D:0110 JNZ 0109 

LOSpeOTI2 INT 3 

103D:0113 

Zevon) 2b 65 IF 2B C4 

-D 0210 022F 

103D:0210 25 4F 85 1F 2B C4 43 0C-01 01 01 00 02 FF FF FF %0..+DC..... 
103D:0220 FF FF FF FF FF FF FF FF-FF FF FF FF 45 OD CA 2N diate) sane E.J* 
-G =100 

AX=00C4 BX-0000 CX=0000 Dx=0000 SP=CFDE BP=0000 SI=0216 DI=022E 
DS=103D ES=103D SS=103D CS=103D IP=0112 NV UP DI PL ZR NA PE NC 
103D:0112 CC INT 3 

-D 0210 022F 

103D:0210 25 4F 85 1F 2B C4 43 0C-01 01 01 00 02 FF FF FF %0..+DC..... 
103D:0220 FF FF FF FF FF FF FF FF-25 4F 85 1F 2B C4 CA 2A J. 808 SBOE 


ne 


APPENDIX A: DEBUG PROGRAMMING 713 


Table A-2: Summary of DEBUG Commands 


Assemble 
C <start address> <end address> <compare address> 


Compare 

aaa C <start address> <L number of bytes> <compare address> 
Dump D <start address> <end address> 

a - D <start address> <L number of bytes > 

Enter E <address> <data list> 
we o OS 
Fill F <start address> <end address> <data> 

OoOO ee 
G_ <= start address> <end address (es)> 


H <number 1> <number 2> 


L <start address> <drive> <start sector> <sectors> 


M <start address> <end address> <destination> 


Move 
eo M <start address> <L number of bytes> <destination> 


N_ <filename> 
Procedure P <= start address> <number of instruction > 
Register R_ <register name> 


Search S <startaddress> <end address> <data> 
S <start address> <L number of bytes> <data> 
Trace T_ <= start address> <number of instruction > 


Unassemble U <start address> <end address> 
U <start address> <L number of bytes > 


W_<start address> <drive> <start sector> <sectors> 


Notes: 

1. All addresses and numbers are given in hex . 

2. Commands may be entered in lowercase or uppercase , or a combination. 

3. Ctrl-c will stop any command. 

4. Ctrl-Num Lock will stop scrolling of command output . To resume scrolling, enter any key. 


714 


APPENDIX B 


x86 INSTRUCTIONS 
DESCRIPTION 


OVERVIEW 


In this appendix, we list the instructions of the 8086, give 
their format and expected operands, and describe the function of 
each instruction. Where pertinent, programming examples have 
been given. These instructions will operate on any x86 processor. 
There are additional instructions for the 80186 processor and 
above (80286, 80386, 80486, and Pentium); however, these instruc- 
tions are not given in this list and can be found in their datasheets. 


715 


SECTION B.1: THE 8086 INSTRUCTION SET 


AAA ASCII Adjust after Addition 
Flags: Affected: AF and CF. Unpredictable: OF, SF, 4F, PF. 
Format: AAA 


Function: This instruction is used after an ADD instruction has added two digits 
in ASCII code. This makes it possible to add ASCII numbers without masking off the 
upper nibble "3". The result will be unpacked BCD in AL with the carry flag set if need- 
ed. This instruction adjusts only on the AL register. AH is incremented if the carry flag 
is set. 


Example 1: 
MOV AL, S 1H ;AL=31 THE ASCII CODE FOR 1 
ADD AL, Si Tal ADD 37 (ASCI ROR) TORAN, AL=68H 
AAA ;AL=08 AND CF=0 


In the example above, ASCII 1 (31H) is added to ASCH 7 (37H). After the AAA 
instruction, AL will contain 8 in BCD and CF = 0. The following example shows anoth- 
er ASCII addition and then the adjustment: 


Example 2: 
MOV PN ee ;AL=39 ASCII FOR 9 
ADD Al, US 7ADD 35 (ASCII FOR 5) TO AL THEN AL=6EH 
AAA NOW AL=04 CF=1 
OR AL, 30H ;converts result to ASCII 
AAD ASCII Adjust before Division 
Flags: Affected: SF, ZF, PF. Unpredictable: OF, AF, CF. 
Format: AAD 


Function: Used before the DIV instruction to convert two unpacked BCD digits 
in AL and AH to binary. A better name for this would be BCD to binary conversion before 
division. This allows division of ASCII numbers. Before the AAD instruction is execut- 
ed, the ASCII tag of 3 must be masked from the upper nibble of AH and AL. 


Example: 

MOV AX, 3435H 7;AX=3435 THE ASCII FOR 45 

AND AX, OFOFH ;AX=0405H UNPACKED BCD FOR 45 

AAD ;AX=002DH HEX FOR 45 

MOV DL (0) 7 ; DL=07 

DIV DL 72DH DIV BY 07 GIVES AL=06, AH=03 

OR AX, 3030H 7; AL=36=QUOTIENT AND AH=33=REMAINDER 
AAM ASCII Adjust after Multiplication 
Flags: Affected: AF, CF. Unpredictable: OF, SF, ZE, PE. 
Format: AAM 


Function: Again, a better name would have been BCD adjust after multiplication. 
It is used after the MUL instruction has multiplied two unpacked BCD numbers. It con- 
verts AX from binary to unpacked BCD. AAM adjusts only AL, and any digits greater 
than 9 are stored in AH. 


716 


MOV Ning PS) ;AL=35 

AND AL, OFH AROS UNPACKED BCD FOR =S 

MOV BL,'4' ;BL=34 

AND BL OEH  BT=04 UNFPACKED BCD FOR 4 

MUL BL 7; AX=0014H=20 DECIMAL 

AAM ; AX=0200 

OR AX, 3030H BMS SP 20) INSIC ILI FOR 20 
AAS ASCII Adjust after Subtraction 
Flags: Ae oree AP, Cr. “Unpreduetable: OF, SF, 4F, PF. 
Format: AAS 


Function: After the subtraction of two ASCII digits, this instruction is used to 
convert the result in AL to packed BCD. Only AL is adjusted; the value in AH will be 
decremented if the carry flag is set. 


Example: 
MOV AL, 32H pAL=32 ASCII FOR 2 
MOV DH, 3WH »DH=37 ASCII FOR 7 
SUB AL, DH > AL-DH=32-37=FBH WHICH IS -5 IN 2'S COMP 
;CF=1 INDICATING A BORROW 
AAS ;NOW AL=05 AND CF=1 
ADC Add with Carry 
Flags: Maeaeoccece (Oh SE ZE, AREE, Cr. 
Format: ADC dest,source dest =sdest + sSounce eH 


Function: If CF = 1 prior to this instruction, then after execution of this instruc- 
tion, source is added to destination plus 1. If CF = 0, source is added to destination plus 
0. Used widely in multibyte and multiword additions. 


ADD Signed or Unsigned ADD 
Flags: IMPEeCIecs OF, SE, ZE; ING, TEE, y CHE 
Format: ADD dest,source ¿dest = dest + source 


Function: Adds source operand to destination operand and places the result in 
destination. Both source and destination operands must match (e.g., both byte size or 
word size) and only one of them can be in memory. 


Unsigned addition: 


In addition of unsigned numbers, the status of CF, ZF, SF, AF, and PF may change, 
but only CF, ZF, and AF are of any use to programmers. The most important of these flags 
is CE. It becomes | when there is carry from D7 out in 8-bit (D0-D7) operations, or a 
carry from D15 out in 16-bit (D0-D15) operations. 


Example 1: 

MOV BH, 45H ; BH=45H 

ADD BH, 4FH >;BH=94H (45H+4FH=94H) 

;CF=0, ZF=0,SF=1,AF=1,and PF=0 

Example 2: 

MOV AL, FEH ; AL=FEH 

MOV Dilip Wf Sal ;DL=75H 

ADD AL, DL » AL=FE+75=73H 


onl, ZF=0pAr=0, SE=0; Pr=0 


I 
APPENDIX B: x86 INSTRUCTIONS DESCRIPTION 717 


Examples: 
MOV DX TIOE ; DX=126FH 
ADD DX, 3465H ;DX=46D4H (126F=3465=46D4H) 
;CF=0, ZF=0, AP=leet=—0, PF=1 


MOV BX, OFFFFH 
ADD BX ;BX=0000 (FFFFH+1=0000) 
;AND CF=1, ZF=1,AF=1,SF=0, PF=1 


Signed addition: 


In addition of signed numbers, the status of OF, ZF, and SF must be noted. 
Special attention should be given to the overflow flag (OF) since this indicates if there is 
an error in the result of the addition. There are two rules for setting OF in signed number 
operation. The overflow flag is set to 1: 


1. Ifthere is a carry from D6 to D7 and no carry from D7 out in an 8-bit operation or a 
carry from D14 to D15 and no carry from D15 out in a 16-bit operation 

2. If there is a carry from D7 out and no carry from D6 to D7 in an 8-bit operation ora 
carry from D15 out but no carry from D14 to D15 in a 16-bit operation 


Notice that if there is a carry both from D7 out and from D6 to D7, then OF = 0 
in 8-bit operations. In 16-bit operations, OF = 0 if there is both a carry out from D15 and 
a carry from D14 to D15. 


Example 4: 
MOV BES ;BL=0000 1000 
MOV DH, +4 ;DH=0000 0100 
ADD Bip DH ;BL=0000 1100 SF=0, ZF=0,0F=0, CF=0 


Notice SF = D7 = 0 since the result is positive and OF = 0 since there is neither a 
carry from D6 to D7 nor any carry beyond D7. Since OF = 0, the result is correct [(+8) 
+ (+4) = (+12)]. 


Example 5: 
MOV AL, +66 7;AL=0100 0010 
MOV (lb, SS) 7CL=0100 0101 
ADD CL, AL 7CL=1000 (0111 = =121 XTNCORREGT 


;CF=0,SF=1,ZF=0, AND OF=1 


In Example 5, the correct result is +135 [(+66) + (+69) = (+135)], but the result 
was —121. The OF = 1 is an indication of this error. Notice that SF = D7 = 1 since the 
result is negative; OF = | since there is a carry from D6 to D7 and CF = 0. 


Example 6: 
MOV ALZ 7AL=1111 0100 
MOV Bins 7BL=0001 0010 
ADD BL, AL 7BL=0000 0110 (WHICH IS +6 ) 


7;SF=0,ZF=0,OF=0, AND CF=1 


Notice above that OF = 0 since there is a carry from D6 to D7 and a carry from 
D7 out. 


Example 7: 
MOV ING 20 ;AH=1110 0010 
MOV DL,+14 7DL=0000 1110 
ADD DL, AH 7;DL=1111 0000 (WHICH IS -16 AND CORRECT) 


;AND SF=1,2Z2F=0,0F=0, AND CF=0 


OF = 0 since there is no carry from D7 out nor any carry from D6 to D7. 


718 


Example 8: 


MOV 26 7;AL=1000 0010 
MOV ed ;BH=1000 OOO1 
ADD AL, BH 7;AL=0000 0011 (WHICH IS +3AND WRONG) 


;AND SF=0,ZF=0 AND OF=1 
OF = 1 since there is carry from D7 out but no carry from D6 to D7. 


AND Logical AND 

Flags: Marcecteds CF = 0, OF = 0, SE, 2E, PF. 
Unpredictable: AF. 

Format: AND dest,source 


Function: Performs logical AND on the operands, bit by bit, storing the result in 
the destination. 


Example: 
MOV BL,39H ;BL=39 
AND BL, 09H ;BL=09 


739 GOUT 1001 
aU, 0000 1001 


POS 0000 1001 


CALL Call a Procedure 
Flags: Unchanged. 
Eonmat: CALL proc erans tern control TEORELOCCOUrE 


Function: Transfers control to a procedure. RET is used to return control to the 
instruction after the call. There are two types of CALLs: NEAR and FAR. Ifthe target 
address is within the same code segment, it is a NEAR call. If the target address is out- 
side the current code segment, it is a FAR CALL. Each is described below. 


NEAR CALL: If calling a near procedure (the procedure is in the same code seg- 
ment as the CALL instruction) then the content of the IP register (which is the address of 
the instruction after the CALL) is pushed onto the stack and SP is decremented by 2. Then 
IP is loaded with the new value, which is the offset of the procedure. At the end of the 
procedure when the RET is executed, IP is popped off the stack, which returns control to 
the instruction after the CALL. There are three ways to code the address of the called 
NEAR procedure: 


l Direct: 
CATEI 

proel PROC NEAR 
RET 

procl ENDP 


2. Register indirect: 


CALL I- SI] -trameuer control to addréss Inel 


3. Memory indirect: 
GALL WORD PTR [ DI] *DE points to the address that 


Poontains LP address sof proc 


nnn CELE 


APPENDIX B: x86 INSTRUCTIONS DESCRIPTION 719 


FAR CALL: When calling a far procedure (the procedure is in a different seg- 
ment from the CALL instruction), the SP is decremented by 4 after CS:IP of the instruc- 
tion following the CALL is pushed onto the stack. CS:IP is then loaded with the seg- 
ment and offset address of the called procedure. In pushing CS:IP onto the stack, CS is 
pushed first and then IP. When the RETF is executed, CS and IP are restored from the 
stack and execution continues with the instruction following the CALL. The following 
addressing modes are supported: 


1. Direct (but outside the present segment): 
CALL procl 

proel PROC FAR 
RETF 

procl ENDP 

2. Memory indirect: 


CALL DWORD PTR [ DI] transfer control (6 CS:IP were 
7Di and DIH point to locat ues. 
(CS and DIA and DM pointe ca 
Bilexeyeheskei Oi I 


CBW Convert Byte to Word 
Flags: Unchanged. 
Format: CBW 


Function: Copies D7 (the sign flag) to all bits of AH. Used widely to convert a 
signed byte in AL to a signed word to avoid the overflow problem in signed number arith- 
metic. 


Example: 
MOV AX, 0 
MOV AL, -5 ;AL=(-5)=FB in 2's complement 
AXT = 0000 0000 1111 1011 
CBW 7;now AX=FFFB 
GC = AIA tka a, LOI 
CLC Clear Carry Flag 
Flags: Atr rected CHE 
Format: CEC 


Function: Resets CF to zero (CF = 0). 


CLD Clear Direction Flag 
lags. AT ecRedREDES 
Format: CLD 


Function: Resets DF to zero (DF = 0). In string instructions if DF = 0, the point- 
ers are incremented with each execution of the instruction. If DF = 1, the pointers are 
decremented. Therefore, CLD is used before string instructions to make the pointers 
increment. 


720 


CLI Clear interrupt Flag 


Flags: Affected: IF. 
Format: (CLI 


_ Function: Resets IF to zero, thereby masking external interrupts received on 
INTR input. Interrupts received on NMI input are not blocked by this instruction. 


CMC Complement Carry Flag 
Blagse Affected: CF. 
Format: CMC 


Function: Changes CF from 0 to 1 or from 1 to 0. 


CMP Compare Operands 
Flags: MenececGde OR, SE, ae, AF, PE, CE. 
Format: CMP dest,source Sets flags as if "SUB dest, SouUrecek 


Function: Compares two operands of the same size. The source and destination 
operands are not altered. Performs comparison by subtracting the source operand from 
the destination and sets flags as if SUB were performed. The relevant flags are as follows: 


dest > source 


dest = source 
dest < source 


CMPS/CMPSB/CMPSW Compare Byte or Word String 


Flags: MEAG ECS OF, Si, Vl RN PE, CE. 
Format: CMPSx 


Function: Compares strings a byte or word at a time. DS:SI is used to address 
the first operand; ES:DI is used to address the second. If DF = 0, it increments the point- 
ers SI and DI. If DF = 1, it decrements the pointers. It can be used with prefix REPE or 
REPNE to compare strings of any length. The comparison is done by subtracting the 
source operand from the destination and sets flags as if SUB were performed. 


CWD Convert Word to Doubleword 
Elags: Unchanged. 
Format: CWD 


Function: Converts a signed word in AX into a signed doubleword by copying 
the sign bit of AX into ail the bits of DX. Often used to avoid the overflow problem in 
signed number arithmetic. 


Example: 
MOV DX, 0 
MOV AX; 9 ;AX=(-5)=FFFB in 2's complement 
;DX = 0000H 
CWD 


;DX = FFFFH 


EEE 


APPENDIX B: x86 INSTRUCTIONS DESCRIPTION 721 


DAA Decimal Adjust after Addition 


Flags: Aftected: Gia, vale, JA, Pit, (Cle, Ole. 
Format: DAA 


Function: This instruction is used after addition of BCD numbers to convert the 
result back to BCD. It adds 6 to the lower 4 bits of AL if it is greater than 9 or if Ar = 1. 
Then it adds 6 to the upper 4 bits of AL if it is greater than 9 or if CF = 1. 


Example 1: 
MOV AL,47H 7;AL=0100 0111 
ADD AL, 38H ;AL=47H+38H=7FH. invalid BCD 
DAA NON AT=1000 0101 (85H IS VALID Beep) 


In this example, since the lower nibble was larger than 9, DAA added 6 to AL. If 
the lower nibble is smaller than 9 but AF = 1, it also adds 6 to the lower nibble. 


Example 2: ` 
MOV AI A SNet 7;AL=0010 1001 
ADD AL, 18H 7;AL=0100 0001 INCORRECT RESULT 


DAA ;AL=0100 0111 A VALID BCD FOR 47H. 


The same thing can happen for the upper nibble. 


Example 3: 
MOV AL, 52H ;AL=0101 0010 
ADD AL, 91H ;AL=1110 0011 AN INVALID BCD 
DAA ;AL=0100 0011 AND CF=1 


Again the upper nibble can be smaller than 9 but because CF = 1, it must be cor- 
rected. 


Example 4: 
MOV AL, 94H 7;AL=1001 0100 
ADD AL, 91H 7;AL=0010 0101 INCORRECT RESULT 
DAA 7AL=1000 0101 A VALID BCD FOR 85 AND CF=1 


It is entirely possible that 6 is added to both the high and low nibbles. 


Example 5: 

MOV AL, 54H ;AL=0101 0100 

ADD AL, 87H 7;AL=1101 1011 INVALID BCD 

DAA 7AL=0100 0001 AND CF=1 (141 IN BCD) 
DAS Decimal Adjust after Subtraction 
Flags: Apfected: SE, ZE, AB, (PEF, CE. Unpredictable: OF. 
Format: DAS 


Function: This instruction is used after subtraction of BCD numbers to convert 
the result to BCD. If the lower 4 bits of AL represent a number greater than 9 or if AF = 
1, then 6 is subtracted from the lower nibble. If the upper 4 bits of AL are now greater 
than 9 or if CF = 1, 6 is subtracted from the upper nibble. 


Example: 
MOV AL, 45H FAL=O100 OLO BCD ror 45 
SUB AL; 17H ;AL=0010 1110 AN INVALID BCD 
DAS ;AL=0010 1000 BCD FOR 2545-17-23) 


For more examples of problems associated with BCD arithmetic, see DAA. 


ee 
722 


DEC Decrement 


Flags: eneGised= OF, SF, ZF, AF, PE. Unchanged: CF. 
Format: DEC dest ;dest = dest-1l 


Function: Subtracts 1 from the destination operand. Note that CF (carry/ borrow) 
is unchanged even if a value 0000 is decremented and becomes FFFF. 


DIV Unsigned Division 
Flags: Unpmedtcedole: OF, SE, ZB, AF, PE, CF. 
Format: DIV source ;divide AX or DX:AX by source 


Function: Divides either an unsigned word (AX) by a byte or an unsigned dou- 
bleword (DX:AX) by a word. If dividing a word by a byte, the quotient will be in AL and 
the remainder in AH. If dividing a doubleword by a word, the quotient will be in AX and 
the remainder in DX. Divide by zero causes interrupt type 0. 


ESC Escape 
Flags: Unchanged. 
Format: ESC 


Function: This instruction facilitates the use of math coprocessors (such as the 
8087), which share data and address buses with the microprocessor. ESC is used to pass 
an instruction to a coprocessor and is usually treated as NOP (no operation) by the main 


processor. 

HLT Halt 

Flags: Unchanged. 
Horan: HLT 


Function: Causes the microprocessor to halt execution of instructions. To get 
out of the halt state, activate an interrupt (NMI or INTR) or RESET. 


IDIV Signed Number Division 
Flags: Unpredictable s OF, SF, ZE, AF, BE, Cr. 
Format: IDI cornue ;divide AX or DX:AX by source 


Function: This division function divides either a signed word (AX) by a byte or 
a signed doubleword (DX:AX) by a word. If dividing a word by a byte, the signed quo- 
tient will be in AL and the signed remainder in AH. If dividing a doubleword by a word, 
the signed quotient will be in AX and the signed remainder in DX. Divide by zero caus- 
es interrupt type 0. 


IMUL Signed Number Multiplication 
Frags: Arece Or, (CE. Unpredictable: SF, ZF, Ar; PF. 
Format: DMUimesoluigesc AX =sSource x AL or DX:AX =source x AX 


Function: Multiplies a signed byte or word source operand by a signed byte or 
word in AL or AX with the result placed in AX or DX:AX. 


IN Input Data from Port 
Elagiss: Unchanged. 
Format: IN accumulator,port CSLIMFSNE lone) abieliers) Jy (oye 


;word into AX 


Function: Transfers a byte or word to AL or AX from an input port specified by 
the second operand. The port address can be direct or register indirect: 


1. Direct: the port address is specified directly and cannot be larger than FFH. 


ee een enn re I I LK 
APPENDIX B: x86 INSTRUCTIONS DESCRIPTION 723 


Example 1: 


IN AL, 99H ;BRING A BYTE INTO AL FROM PORT 99H 
Example 2: 
IN AX, 78H ;BRING A WORD FROM PORT ADDRESSES 78H 


;AND 79H. THE BYTE FROM PORT 78 GOES 
7TO AL AND BYTE FROM PORT] 770 LO" nu- 


2. Register indirect: the port address is kept by the DX register. Therefore, it can be as 


high as FFFFH. 
Example 3: 
MOV DX, 481H 7; DX=481H 
IN AL, DX ¿BRING THE BYTE TO AL FROM THE 
PORT 
s;WHOSE ADDRESS IS POINTED BY DX 
Example 4: 
IN AX, DX ;BRING A WORD FROM PORT ADDRESS POINTED 
>TO BY DX. THE BYTE FROM PORT AT 
>DX GOES TO AL AND BYTE FROM PORT AT 
BID aril SO) lel 
INC increment 
Flags: Affected: OF, SF, ZF, AF, PF. Unchanged: CF. 
Format: INC destination ;dest = dest + 1 


Function: Adds | to the register or memory location specified by the operand. 
Note that CF is not affected even if a value FFFF is incremented to 0000. 


INT Interrupt 
Flags: Affected: I, WUE 
Format: INT type ;transfer control to INT type 


Function: Transfers execution to one of the 256 interrupts. The vector address is 
specified by the type number, which cannot be greater than FFH (0 to FF = 256 interrupts). 


The following steps are performed for the interrupt: 


SP is decremented by 2 and the flags are pushed onto the stack. 

SP is decremented by 2 and CS is pushed onto the stack. 

SP is decremented by 2 and the IP of the next instruction after the interrupt is pushed 

onto the stack. 

4. Miultiplies the type number by 4 to get the address of the vector table. Starting at this 
address, the first 2 bytes are the value of IP and the next 2 bytes are the value for CS 
of the interrupt handler (interrupt handler is also called interrupt service routine). 

5. Resets IF and TF. 


U N e 


Interrupts are used to get the attention of the microprocessor. In the 8088/86 there 
are a total of 256 interrupts: INT 00, INT 01, INT 02, ... , INT FF. As mentioned above, 
the address that an interrupt jumps to is always four times the value of the interrupt num- 
ber. For example, INT 03 will jump to memory address 0000CH (4 x 03 = 12 = OCH). 
he B-1 is a partial list of the interrupts, commonly referred to as the interrupt vector 
table. 

Every interrupt has a program associated with it called the interrupt service rou- 
tine (ISR). When an interrupt is invoked, the CS:IP address of its ISR is retrieved from the 
vector table (shown above). The lowest 1024 bytes (256 x 4 = 1024) of RAM are set aside 
for the interrupt vector table and must not be used for any other function. 


SE ee 
724 


Table B-1: Interrupt Vector Table 


LINT # (hex) [Physical Address | Logical Address SS 
antro O Z —  Tooooo 0000:0000 

00004 0000:0004 
in = - =- leas 0000:0008 
0000C 
00010 0000:0010 

00014 
En e = -* ae 


Example: Find the physical and logical addresses of the vector table associated with 
(a) INT 14H and (b) INT 38H. 


003FC 


Solution: 


(a) The physical address for INT 14H is 00050H-00053H (4 x 14H = 50H). That gives 
the logical address of 0000:0050H-0000:0053H. 


(b) The physical address for INT 38H is OOOEOH-000E3H, making the physical address 
0000:00EOH—0000:00E3H. 


The difference between INTerrupt and CALL instructions 


The following are some of the differences between the INT and CALL FAR 
instructions: 


1. While a CALL can jump to any location within the 1-megabyte address range 
(00000—FFFFF) of the 8088/86 CPU, "INT nn" jumps to a fixed location in the vec- 
tor table as discussed earlier. 

2. While the CALL is used by the programmer at a predetermined point in a program, 
a hardware interrupt can come in at any time. 

3. A CALL cannot be masked (disabled), but "INT nn" can be masked. 

4. While a "CALL FAR" automatically saves on the stack only the CS:IP of the next 
instruction, "INT nn" saves the FR (flag register) in addition to the CS:IP. 

5. While at the end of the procedure that has been CALLed the RETF (return FAR) is 
used, for "INT nn" the instruction IRET (interrupt return) is used. 


The 256 interrupts can be categorized into two different groups: hardware and 
software interrupts. 


Hardware interrupts 


The 8088/86 microprocessors have two pins set aside for inputting hardware 
interrupts. They are INTR (interrupt request) and NMI (nonmaskable interrupt). Although 
INTR can be ignored through the use of software masking, NMI cannot be masked using 
software. These interrupts are activated externally by putting 5 volts on the hardware pins 
of NMI or INTR. Intel has assigned INT 02 to NMI. When it is activated it will jump to 
memory location 00008 to get the address (CS:IP) of the interrupt service routine (ISR). 
Memory locations 00008, 00009, 0000A, and 0000B contain the 4-byte CS:IP. There is 
no specific location in the vector table assigned to INTR because INTR is used to expand 
the number of hardware interrupts and should be allowed to use any "INT nn" instruction 
that has not been assigned previously. In the IBM PC, one Intel 8259 PIC (programma- 
ble interrupt controller) chip is connected to INTR to add a total of eight hardware inter- 
rupts to the microprocessor. IBM PC AT, PS/2 80286, and x86 computers use two 8259 
chips to allow up to 15 hardware interrupts. 


APPENDIX B: x86 INSTRUCTIONS DESCRIPTION 725 


Software interrupts 


These interrupts are called software interrupts since they are invoked as a result 
of the execution of an instruction and no external hardware is involved. In other words, 
these interrupts are invoked by executing an "INT nn" instruction such as the DOS func- 
tion call "INT 21H" or video interrupt "INT 10H". These interrupts can be invoked by a 
program at any time, the same as any other instruction. Many of the interrupts in this cat- 
egory are used by the DOS operating system and IBM BIOS to perform the essential tasks 
that every computer must provide to the system and the user. Also within this group of 
interrupts are predefined functions associated with some of the interrupts. They are "INT 
00" (divide error), "INT 01" (single step), "INT 03" (breakpoint), and "INT 04" (signed 
number overflow). Each one is described below. These interrupts are shown in Table B- 
2. Looking at Table B-2, one can say that aside from "INT 00" to "INT 04", which have 
predefined functions, the rest of the interrupts, from "INT 05" to "INT FF", can be used 
to implement either software or hardware interrupts. 


Table B-2: IBM PC Interrupt System 


‘interrupt [Logical Address __| Physical Address | 


a 

: 

| 
ed 
— 
es 


ae | DOS function calls 


Functions associated with "INT 00" to "INT 03" 


As mentioned earlier, interrupts "INT 00" to "INT 03" have predefined functions 
and cannot be used in any other way. The function of each is described next. 


INT 0D (divide error) 


This interrupt, sometimes referred to as a conditional or exception interrupt, is 
invoked by the microprocessor whenever there is a condition that it cannot take care of, 
such as an attempt to divide a number by zero. "INT 00" is invoked by the microproces- 
sor whenever there is an attempt to divide a number by zero. In the IBM PC and compat- 
ibles, the service subroutine for this interrupt is responsible for displaying the message 
"DIVIDE ERROR" on the screen if a program such as the following is executed: 


MOV AL,25 p put 25 into AL 
MOV BL, 00 BP jowhe OO sini BE 
DIV BL GF Cuivaicle 2S by 00 


This interrupt is also invoked if the quotient is too large to fit into the assigned 
register when executing a DIV instruction. 


INT 01 (single step) 


There is often a need to execute a given program one instruction at a time and then 
inspect the registers (possibly memory as well) to see what is happening inside the CPU. 
This is commonly referred to as single-stepping. IBM and Microsoft call it TRACE in the 
DEBUG program. To allow the implementation of single-stepping, Intel has set aside 
"INT 01" specifically for that purpose. For the Trace command in DEBUG after execu- 
tion of each instruction, the CPU jumps automatically to physical location 00004 to fetch 
the 4 bytes for CS:IP of the interrupt service routine. One of the functions of this ISR is 
to dump the contents of the registers onto the screen. 


a 
726 


INT 02 (nonmaskable interrupt) 


This interrupt is used in the PC to indicate memory errors, among other problems. 
INT 03 (breakpoint) 


While in single-step mode, one can inspect the CPU and system memory after 
execution of each instruction. A breakpoint allows one to do the same thing, after execu- 
tion of a group of instructions rather than after each instruction. Breakpoints are put in at 
certain points of a program to monitor the flow of the program and to inspect the results 
after certain instructions. The CPU executes the program to the breakpoint and stops. One 
can proceed from breakpoint to breakpoint until the program is complete. With the help 
of single-step and breakpoints, programs can be debugged and tested more easily. The 
Intel 8088/86 CPUs have set aside "INT 03" for the sole purpose of implementing break- 
points. When the instruction "INT 03" is placed in a program the CPU will execute the 
program until it encounters "INT 03", and then it stops. One interesting point about this 
interrupt is that it is a one-byte instruction, in contrast to all other interrupt instructions, 
"INT nn", which are two-byte instructions. This allows the user to insert | byte of code 
and remove it to proceed with the execution of the program. The opcode for INT 03 is 
LCT OMS 


IBM PC and DOS assignment of interrupts 


When the IBM PC was being developed, the designers at IBM had to coordinate 
the assignment of the 256 available interrupts for the 8088/86 with Microsoft, the devel- 
oper of the DOS operating system, lest a conflict occur between the BIOS and DOS inter- 
rupt designations. The result of cooperation in assigning interrupts to IBM BIOS subrou- 
tines and DOS function calls is shown in Table B-2. The table gives a partial listing of 
interrupt numbers from 00 to FF, the logical address of the service subroutine for each 
interrupt, their physical addresses, and the purpose of each interrupt. It must be men- 
tioned that depending on the computer and the DOS version, some of the logical address- 
es could be different from Table B-2. 


How to get the vector table of any PC 


One can get the vector table of any x86 IBM-compatible computer and inspect the 
logical address assigned to each interrupt. To do that use DEBUG's DUMP command "-D 
0000:0000", as shown next. 


A> debug 

=D 0000: 0000 

COOM OOCimere 56 28902 56 07 70 00-C3.E2 00 EO 56 07 7000 .i0...<20.. 
VOGBOOTO 26 07 70 00 54 FF O00 FO-47 FF 00 FO 47 EERO O EO a 


Note: The contents of the memory locations could be different, depending on the 
DOS version. 


Example: l l l ; 
From the dump above, find the CS:IP of the service routine associated with 
INT 5S. 


Solution: 

To get the address of "INT 5", calculate the physical address of 00014H (5 x 4= 
00014H). The contents of these locations are 00014 = 54, 00015 = FF, 00016 = 00, and 
00016 = FO. This gives CS = F000 and IP = FF54. 


INTO Interrupt on Overflow 
Flags: Affeesead= IF, Tr. 
Format: INTO 


Function: Transfers execution to an interrupt handler written for overflow if OF 


APPENDIX B: x86 INSTRUCTIONS DESCRIPTION 127 


(overflow flag) has been set. Intel has set aside INT 4 for this purpose. Therefore, if OF 
= | when INTO is executed, the CPU jumps to memory location 00010H (4 x 4 = 16 = 
10H). The contents of memory locations 10H, 11H, 12H, and 13H are used as IP and CS 
of the interrupt handler procedure. This instruction is widely used to detect overflow in 
signed number addition. In signed number operations, OF becomes 1 in two cases: 


1. Whenever there is a carry from D6 to D7 in 8-bit operations and no carry from D7 
out (or in 16-bit operations when there is carry from D14 to D15 and CF = 0) 

2. When there is carry from D7 out and no carry from D6 to D7 (or in the case of 16- 
bit operations when there is a carry from D15 out and no carry from D14 to D15) 


IRET Interrupt Return 
Flags: Arfecteoi ORIDE IE, wo, ‘si, Vl, Ae, JB, (Cl. 
Format: TREI 


Function: Used at the end of an interrupt service routine (interrupt handler), this 
instruction restores all flags, CS, and IP to the values they had before the interrupt so that 
execution may continue at the next instruction following the INT instruction. While the 
RET instruction is used at the end of the subroutine associated with the CALL instruction, 
IRET must be used for the subroutine associated with the "INT XX" instruction or the 
hardware interrupt handler. ` 


JUMP Instructions 


The following instructions are associated with jumps (both conditional and 
unconditional). They are categorized according to their usage rather than alphabetically. 


JCO ELON 


Flags: Unchanged. 
Format: Jxx target ;jJump to target upon condition 


Function: Used to jump to a target address if certain conditions are met. The tar- 
get address cannot be more than —128 to +127 bytes away. The conditions are indicated 
by the flag register. The conditions that determine whether the jump takes place can be 
categorized into three groups: 


l. flag values, 
2. the comparison of unsigned numbers, and 
3. the comparison of signed numbers. 


Each is explained next. 
1. "J condition" where the condition refers to flag values. The status of each bit of the 


flag register has been decided by execution of instructions prior to the jump. The fol- 
lowing "J condition" instructions check if a certain flag bit is raised or not. 


we Jump Carry jump if CF=1 
JNC Jump No Carry jump if CF=0 
wiz Jump Parity jump if PF=1 
JNP Jump No Parity jump if PF=0 
JZ Jump Zero jump if ZF=1 
JNZ Jump No Zero jump if ZF=0 
JS Jump Sign Jump if SESI 
JNS Jump No Sign jump if SF=0 
JO Jump Overflow jump if OF=1 
JNO Jump No Overflow jump if OF=0 


Notice that there is no "J condition" instruction for AF. 


rr 
728 


2. "J condition" where the condition refers to the comparison of unsigned numbers. 


After a compare (CMP dest,source) instruction is executed, CF and ZF indicate the 
result of the comparison, as follows: 


CF ZF 
destination > source 0 0 
destination = source 0 1 
destination < source 1 0 


Since the operands compared are viewed as unsigned numbers, the following "J con- 
dition" instructions are used. 


JA Jump Above jump if CF=0 and ZF=0 
JAE Jump Above or Equal jump if CF=0 

JB Jump Below jump if CF=1 

JBE Jump Below or Equal Jump if CF=1 or ZF=1 
JE Jump Equal jump if ZF=1 

JNE Jump Not Equal jump if ZF=0 


3. "J condition" where the condition refers to the comparison of signed numbers. In the 
case of the signed number comparison, although the same instruction, "CMP destina- 
tion,source", is used, the flags used to check the result are as follows: 


destination > source OF=SF or ZF=0 
destination = source ZF=1 
destination < source OF inverse of SF 


Consequently, the "J condition" instructions used are different. They are as fol- 


lows: 
JG Jump Greater jump if ZF=0 or OF=SF 
JGE Jump Greater or Equal jump if OF=SF 
JL Jump Less jump if OFSF 
JLE Jump Less or Equal jump if ZF=1 or OFSF 
JE Jump if Equal jump if ZF = 1 


There is one more "J condition" instruction: 


JICXZ Jump if CX is Zero. ZF is ignored. 


All "J condition" instructions are short jumps, meaning that the target address can- 
not be more than —128 bytes backward or +127 bytes forward from the IP of the instruc- 
tion following the jump. What happens if a programmer needs to use a "J condition" to 
go to a target address beyond the —128 to +127 range? The solution is to use the "J con- 
dition" along with the unconditional JMP instruction, as shown next. 


ADD BOK p Si] 
JNC NEXT 
JMP TARGET1 


NEXT: 

TARGET: ADD DUE LO 

JMP Unconditional Jump 

Elegge Unchanged. 

Format: JMP [ directives] target ;jump to target address 


Function: This instruction is used to transfer control unconditionally to a new 
— = _ eel 


APPENDIX B: x86 INSTRUCTIONS DESCRIPTION 729 


address. The difference between JMP and CALL is that the CALL instruction will return 
and continue execution with the instruction following the CALL, whereas JMP will not 
return. The target address could be within the current code segment, which is called a near 
jump, or outside the current code segment, which is called a far jump. Within each cate- 
gory there are many ways to code the target address, as shown next. 


1. Near jump 


(a) direct short jump: In this jump the target address must be within —128 to +127 
bytes of the IP of the instruction after the JMP. This is a 2-byte instruction. The first byte 
is the opcode EBH and the second byte is the signed number displacement, which is added 
to the IP of the instruction following the JMP to get the target address. The directive 
SHORT must be coded, as shown next: 


JMP SHORT OVER 
OVER: 


If the target address is beyond the —128 to +127 byte range and the SHORT direc- 
tive is coded, the assembler gives an error. 

(b) Direct jump: This is a 3-byte instruction. The first byte is the opcode E9H 
and the next two bytes are the signed number displacement value.The displacement is 
added to the IP of the instruction following the JMP to get the target address. The dis- 
placement can be in the range —32,768 to +32,767. In the absence of the SHORT direc- 
tive, the assembler in its first pass always uses this kind of JMP, and then in the second 
pass if the target address is within the —128 and +127 byte range, it uses the NOP opcode 
90H for the third byte. This is the reason to code the directive SHORT if it is known that 
the target address of the JMP is within the short range. 

(c) Register indirect jump: In this jump the target address is in a register as shown 
next: 


JMP DI ;jump to the address found in DI 
Any nonsegment register can be used for this purpose. 


(d) Memory indirect jump: In this jump the target address is in a memory loca- 
tion whose address is pointed at by a register: 


JMP WORD PTR [SI] ;jump to the address found at the address in SI 
The directive WORD PTR must be coded to indicate this 1s a near jump. 
2. Far jump 


In a far JMP, the target address is outside the present code segment; therefore, not 
only the offset value but also the segment value of the target address must be given. A far 
jump is a 5-byte instruction: the opcode EAH and 4 bytes for the offset and segment of 
the target address. The following shows the two methods of coding the far jump. 

(a) Direct far jump: This requires that both CS and IP be updated. One way to 
do that is to use the LABEL directive: 


JMP TARGET2 


TARGET2 LABEL FAR 
ENTRY: 


l This is exactly what IBM did in the BIOS of the IBM PC/XT when the comput- 
er is booted. When the power to the PC is turned on, the 8088/86 CPU begins to execute 
at poa FFFF:0000H. IBM uses a FAR jump to make it go to location F000:E05BH, 
as shown next: 


;CS=FFFF and IP=0000 


= 
730 


0000 EAS5SBEOOOFO JMP RESET 


;CS=F000 

ORG OEO5BH 
EO05B RESET LABEL FAR 
EO05B START: 
HOBB CLI 
HOSE 


The EXTRN and PUBLIC directives also can be used for the same purpose. 
_ (b) Memory indirect far jump: The target address (both CS:IP) is in a memory 
location pointed to by the register: 


JMP DWORD PTR [ BX] 


The DWORD and PTR directives must be used to indicate that it is a far jump. 


LAHF Load AH from Flags 
Flags: Unchanged. 
Format: LAHF 


Function: Loads the lower 8 bits of the flag register into AH. 


LDS Load Data Segment Register 
Flags: Unchanged. 
Pormat: LDS dest,source;load dest and DS starting at source 


Function: Loads into destination (which is a register) the contents of two memo- 
ry locations indicated by source and loads DS with the contents of the two succeeding 
memory locations. This is useful for accessing a new data segment and its offset. 


Example: 


Assume the following memory locations with the contents: 
7DS:1200=(46) 


7;DS:1201=(10) 


7DS:1202=(38) 

7DS:1203=(82) 

LDS DIEPER ;now DI=1046 and DS=8238. 
LEA Load Effective Address 
HEIS: Unchanged. 
Format: LEA dest, source ¿dest = OFFSET source 


Function: Loads into the destination (a 16-bit register) the effective address of a 
direct memory operand. 


Example 1: 
ORG 0100H 
DATA DB Sa DOr ln SORTS S48, S AS) 


;to access the sixth element: 
LEA SI, DATA+5 7Si1=1008+5-105 THE EREECTIVE ADDRESS 
MOV ACESTE] 7;GET THE SIXTH ELEMENT 


Example 2: 
Ome P= ZOOS! aiae! SiS SOs 
LEA DPS All BO I SU) se LOeHet 
;DX=effective address=2000+3500+100=5600H 


The following two instructions show two different ways to accomplish the same 


APPENDIX B: x86 INSTRUCTIONS DESCRIPTION 731 


thing: 


MOV SI,OFESEL DATA ;advantage: executes faster 
LEA SI, DATA 
LES Load Extra Segment Register 
PlLecs Unchanged. 
Format: LES dest,source;load dest and ES starting at source 


Function: Loads into destination (a register) the contents of two memory loca- 
tions indicated by the source and loads ES with the contents of the two succeeding mem- 
ory locations. Useful for accessing a new extra segment and its offset. This instruction is 
similar to LDS except that the ES and its offset are being loaded. 


LOCK Lock System Bus Prefix 
Flags: Unchanged. 
FOEMaE: LOCK ;used as a prefix before instructions ` 


Function: Used in microcomputer systems with more than one processor to pre- 
vent another processor from gaining control over the system bus during execution of an 
instruction. 


LODS/LODSB/LODSW Load Byte or Word String 


Flags: Unchanged. 
Eormat: LODSx 


Function: Loads AL or AX with a byte or word from the memory location point- 
ed to by DS:SI. If DF = 0, SI will be incremented to point to the next location. If DF = 
1, SI will be decremented to point to the next location. SI is incremented/decremented by 
l or 2, depending on whether it is a byte or word string. 


LOOP Loop until CX = 0 
Flags: Unchanged. 
Format: TOOP target DEC CX Chen gump  EOmtardee a: Eene 


Function: Decrements CX by 1, then jumps to the offset indicated by the operand 
if CX is not zero; otherwise continues with the next instruction below the LOOP. This 
instruction is equivalent to 


DEC CX 
JNZ target 


LOOPE/LOOPZ LOOP if Equal / Loop if Zero 


Flags: Unchanged. 
Format: LOOPx target DEC CX, jump to target if CX 0O and ZF=1 


Function: Decrements CX by 1, then jumps to location indicated by the operand 
if CX is not zero and ZF is 1; otherwise continues with the next instruction after the 
LOOP. In other words, it gets out of the loop only when CX becomes zero or when ZF = 
0. 


Example: Assume that 200H memory locations from offset 1680H 
should contain 55H. LOOPE can be used to see if any of these 
locations does not contain 55H: 


MOV Qe, 200 PSI UE THE COUNTER 
MOV siI,1680H oB UP RMblSUe, 120) ITN IME. 


aaa 
732 


BACK: CMP MST SSH 7;COMPARE THE 55H WITH MEM LOCATION 
;POINTED AT BY SI 
INC SL ; INCREMENT THE POINTER 
LOOPE BACK / COMMMNUE THE PROCESS UNTIL CX=0 OR 
TARSO. IN OTHER WORDS EXIT IF ONE 
7 LOCATION DOES NOT HAVE 55H 


LOOPNE/LOOPNZ LOOP While CF Not Zero and ZF Equal Zero 


Flags: Unchanged. 
Format: LOOPxx target 7;DEC CX, then jump if CX and ZF not zero 


Function: Decrements CX by 1, then jumps to location indicated by the operand 
if CX and ZF are not zero; otherwise continues with the next instruction below the LOOP. 
In other words it will exit the loop if CX becomes 0 or ZF = 1. 


Example: 
Assume that the daily temperatures for the last 30 days have been stored starting 


at memory location with offset 1200H. LOOPE can be used to find the first day that had 
a 90-degree temperature. 


MOV Ore, SO) Fon UP The COUNTER 
MOV IDL, LAO VL 7oHt UP THE POINTER 
AGAIN: CMP EDA A SO 
INC DI 
LOOPNE AGAIN 
MOV Move 
grags: Unchanged. 
Format: MOV dest,source ;copy source to dest 


Function: Copies a word or byte from a register, memory location, or immediate 
number to a register or memory location. Source and destination must be of the same size 
and cannot both be memory locations. 


MOVS/MOVSB/MOVSW Move Byte er Word String 


Flags: Unchanged. 
Hormat: MOVSx 


Function: Moves byte or word from memory location pointed to by DS:SI to 
memory location pointed to by ES:DI. If DF = 0, both pointers are incremented; other- 
wise, they are decremented. SI and DI are incremented/decremented by 1 or 2 depending 
on whether it is a byte or word string. When used with the REP prefix, CX is decrement- 
ed each time until CX is zero. 


MUL Unsigned Multiplication 
Hags: Affected: OF, CE. Unpredictable: SE Zi fare, BE. 
Format: MUL source ;AX = source x AL or DX:AX = source x AX 


Function: Multiplies an unsigned byte or word indicated by the operand by a 
unsigned byte or word in AL or AX with the result placed in AX or DX:AX. 


NEG Negate 
Flags: Mippecctede OE, SF, ZE, Aly, PF, CF. 
Format: WEG Cegie ;negates operand 


Function: Performs 2's complement of operand. Effectively reverses the sign bit 


APPENDIX B: x86 INSTRUCTIONS DESCRIPTION 733 


of the operand. This instruction should only be used on signed numbers. 


NOP No Operation 
Flags: Unchanged. 
iM@mmaies. NOP 


Function: Performs no operation. Sometimes used for timing delays to waste 
clock cycles. Updates IP to point to next instruction following NOP. 


NOT Logical NOT 
Flags: Unchanged. 
Format: NOT dest ;dest = 1's complement of dest 


Function: Replaces the operand with its negation (the 1's complement). Each bit 
is inverted. 


OR Logical OR . 
Flags: Affected: CF=0, OF=0, SF, ZF, PF. Unpredictable: AF. 
Format: OR dest, source ;dest= dest OR source 


Function: Performs logical OR on the bits of two | A 
operands, replacing the destination operand with the result. | 0 
Often used to turn a bit on. 


OUT Output Byte or Word 
Flags: Unchanged. 
Format: OUT dest,acc PeeeNSESic VCC IES) port desk 


Function: Transfers a byte or word from AL or AX to an output port specified by 
the first operand. Port address can be direct or register indirect as shown next: 


1. Direct: port address is specified directly and cannot be larger than FFH. 


Example 1: 
OUT 68H, AL ; SEND OUT A BYTE FROM AL TO PORT 68H 
or 
OUT 34H, AX SEND OUT A WORD FROM AX TO PORT 


;ADDRESSES 34H AND 35H. THE BYTE 
;FROM AL GOES TO PORT 34H AND 
;THE BYTE FROM AH GOES TO PORT 35H 


2. Register indirect: port address is kept by the DX register. Therefore, it can be as 


high as FFFFH. 
Example 2: 
MOV DX, 64B1H ; DX=64B1H 
QUT DX, AL SENT OUT THE BYTE IN AL TO THE PORT 
;WHOSE ADDRESS IS POINTED TO BY DX 
or 
OUT DX, AX ; SEND OUT A WORD FROM AX TO PORT 


ADDRESS PO UuNh ED © mbes THE BYTE 
7;FROM AL GOES TO PORT DX AND AND BYTE 
7; BROM AK GORS) TO RPORT Daa 


aaae 
734 


POP POP Word 


Flags: Unchanged. 
Pommet: POP dest nest = word off tepmef stack 


Function: Copies the word pointed to by the stack pointer to the register or mem- 
ory location indicated by the operand and increments the SP by 2. 


POPF POP Flags off Stack 
Flags: OPDEED LE, SF, 2E, AF, EPE, CE. 
Format: POPF 


_ _ Function: Copies bits previously pushed onto the stack with the PUSHF instruc- 
tion into the flag register. The stack pointer is then incremented by 2. 


PUSH PUSH Word 
Flags: Unchanged. 
Format: PUSH source PUSA Source onto sieeve 


Function: Copies the source word to the stack and decrements SP by 2. 


PUSHF PUSH Flags onto stack 
Flags: Unchanged. 
Ermat: PUSHF 


Function: Decrements SP by 2 and copies the contents of the flag register to the 
stack. 


RCL/RCR Rotate Left through Carry and Rotate Right through Carry 
Flags: Affected: OF, CF. 
Format: RCx dest)n;dest’= dest rotate right/left n bit positions 


Function: Rotates the bits of the operand right or left. The bits rotated out of the 
operand are rotated into the CF and the CF is rotated into the opposite end of the word or 
byte. Note: "n" must be 1 or CL. 


RET Return from a Procedure 
Flags: Unchanged. 
Hormat RETEA] ;return from procedure 


Function: Used to return from a procedure previously entered by a CALL 
instruction. The IP is restored from the stack and the SP is incremented by 2. If the pro- 
cedure was FAR, then RETF (return FAR) is used, and in addition to restoring the IP, the 
CS is restored from the stack and SP is again incremented by 2. The RET instruction may 
be followed by a number that will be added to the SP after the SP has been incremented. 
This is done to skip over any parameters being passed back to the calling program seg- 
ment. 


ROL/ROR Rotate Left and Rotate Right 


Flags: Aktected: OF, CE. 
Format: ROx dest,n ;rotate dest right/left n bit positions 


Function: Rotates the bits of a word or byte indicated by the second operand right 
or left. The bits rotated out of the word or byte are rotated back into the word or byte at 
the opposite end. Note: "n" must be 1 or CL. 


SAHF Store AH in Flag Register 
Flags: ArT ected: CE, 2E, AF, PE, CF. 
Format: SAHF 


APPENDIX B: x86 INSTRUCTIONS DESCRIPTION 785 


Function: Copies AH to the lower 8 bits of the flag register. 
SAL/SAR Shift Arithmetic Left/ Shift Arithmetic Right 


Flags: Affected: OF, SF, ZF, PF, CF. Umerediccabt ai 
Format: SAx dest,n ;shift signed dest left/right n bit positions 


Function: Shifts a word or byte left /right. SAR/ SAL arithmetic shifts are used 
for signed number shifting. In SAL, as the operand is shifted left bit by bit, the LSB is 
filled with Os and the MSB is copied to CF. In SAR, as each bit is shifted right, the LSB 
is copied to CF and the empty bits filled with the sign bit (the MSB). SAL/SAR essen- 
tially multiplies/divides destination by a power of 2 for each bit shift. Note: "n" must be 
Wor CL. 


SBB Subtract with Borrow 
Flags: INRANGE TO Sit “ald ING PE, CEE 
Format: SBB dest,source ;dest = dest - CF - source 


Function: Subtracts the source operand from the destinat on, replacing the desti- 
nation. If CF = 1, it subtracts 1 from the result; otherwise, it exc -> tes like SUB. 


SCAS/SCASB/SCASW Scan Byte or Word String 


Flags: Artected: OF; SF, ZF, AF PF; CE. 
Kormate SCASX 


Function: Scans a string of data pointed by ES:DI for a value tha’ is in AL or AX. 
Often used with the REPE/REPNE prefix. If DF is zero, the address is i1.cremented; oth- 
erwise, it is decremented. 


SHL/SHR Shift Left/Shift Right 


Flags: Affected: OF, SF, ZF, PF, CF. Unpredictable: AF. 
Format: SHx dest,n ;shift unsigned dest left/right n bit positions 


Function: These are logical shifts used for unsigned numbers, meaning that the 
sign bit is treated as data. In SHR, as the operand is shifted right bit by bit and copied into 
CF, the empty bits are filled with Os instead of the sign bit as is the case for SAR. In the 
case of SHL, as the bits are shifted left, the MSB is copied to CF and empty bits are filled 
with 0, which is exactly the same as SAL. In reality, SAL and SHL are two different 
mnemonics for the same opcode. SHL/SHR essentially multiplies/divides the destination 
by a power of 2 for each bit position shifted. Note: "n" must be 1 or CL. 


STC Set Carry Flag 


HaAgSk Affected: CF. 
Pormat: STC 


Function: Sets CF to 1. 


STD Set Direction Flag 
Flags: IMEIESCESCIS IDF 
Format: STD 


___ Function: Sets DF to 1. Used widely with string instructions. As explained in the 
string instructions, if DF = 1, the pointers are decremented. 


STI Set Interrupt Flag 
Flags: AEECCECORmenEN 
Format: STI 


736 


Function: Sets IF to 1, allowing the hardware interrupt to be recognized through 
the INTR pin of the CPU. 


STOS/STOSB/STOSW Store Byte or Word String 


Elagsi Unchanged. 
Eormati SOS x 


Function: Copies a byte or word from AX or AL to a location pointed to by ES:DI 
and updates DI to point to the next string element. The pointer DI is incremented if DF 
is zero; otherwise, it is decremented. 


SUB Subtract 
Flags: Affected: OF, SF, ZF, AF, PF, CF. 
Format: SUB dest,source ;dest = dest - source 


___ Function: Subtracts the source from the destination and puts the result in the des- 
tination. Sets the carry and zero flags according to the following: 


CF ZF 
dest > source 0 0 the result is positive 
dest = source 0 l the result is 0 
dest < source 1 0 the result is negative in 2's comp 


The steps for subtraction performed by the internal hardware of the CPU are as 
follows: 


1. Takes the 2's complement of the source 
2. Adds this to the destination 
3. Inverts the carry and changes the flags accordingly 


The source operand remains unchanged by this instruction. 


TEST Test Bits 
Flags: Arr eeredki OH, SE, ZE, PE, CE. Unpredictable: AF. 
Format: TEST dest,source ;performs dest AND source 


Function: Performs a logical AND on two operands, setting flags but leaving the 
contents of both source and destination unchanged. While the AND instruction changes 
the contents of the destination and the flag bits, the TEST instruction changes only the flag 
bits. 


Example: Assume that DO and Dl of port 27 indicate conditions A 
and B, respectively, if they are high and only one of them can 


be high at a given time. The TEST instruction can be used as 
£61 lows : 
IN Aly, PORT 27 
TEST AL,0000 0001B pCHMChes lik CONDETLON VA 
JNZ CASE A 7;JUMP TO INDICATE CONDITION A 
TEST AL,0000 0010B PACISICIS IQR (SOMMDML IE INOIN| 18} 
JNZ CASE B ; JUMP TO INDICATE CONDITION B 


7 THERE IS AN ERROR SINCE NEITHER 
; A NOR B HAS OCCURRED. 


APPENDIX B: x86 INSTRUCTIONS DESCRIPTION 737 


WAIT Puts Processor in WAIT State 


Flags: Unchanged. 
Hormati WAIT 


Function: Causes the microprocessor to enter an idle state until an external inter- 
rupt occurs. This is often done to synchronize it with another processor or with an exter- 
nal device. 


XCHG Exchange 
Flags: Unchanged. 
Format: XCHG dest,source ;Swaps dest and source 


Function: Exchanges the contents of two registers or a register and a memory 
location. à 


XLAT Translate 
Flags: Unchanged. 
Format: XLAT 


Function: Replaces contents of AL with the contents of a look-up table whose 
address is specified by AL. BX must be loaded with the start address of the look-up table 
and the element to be translated must be in AL prior to the execution of this instruction. 
AL is used as an offset within the conversion table. Often used to translate data from one 
format to another, such as ASCII to EBCDIC. 


XOR Exclusive OR 

Flags: Affected: CF = 0, OF = 0, SF, ZF, PF. 
Unpredictable: AF. 

Format: XOR dest,source 


Function: Performs a logical exclusive OR on the bits of two operands and puts 
the result in the destination. "XOR AX,AX" can be used to clear AX. 


738 


APPENDIX C 


ASSEMBLER DIRECTIVES AND 
NAMING RULES 


OVERVIEW 


This appendix consists of two sections. The first section 
describes some of the most widely used directives in 80x86 
Assembly language programming. In the second section 
Assembly language rules and restrictions for names and labels 
are discussed and a list of reserved words is provided. 


SECTION C.1: x86 ASSEMBLER DIRECTIVES 


Directives, or as they are sometimes called, pseudo-ops or pseudo-instructions, 
are used by the assembler to translate Assembly language programs into machine lan- 
guage. Unlike the microprocessor's instructions, directives do not generate any opcode; 
therefore, no memory locations are occupied by directives in the final ready-to-run (exe) 
version of the assembly program. To summarize, directives give directions to the assem- 
bler program to tell it how to generate the machine code; instructions are assembled into 
machine code to give directions to the CPU at execution time. The following are descrip- 
tions of the some of the most widely used directives for the 80x86 assembler. They are 
given in alphabetical order for ease of reference. 


ASSUME 


The ASSUME directive is used by the assembler to associate a given segment's 
name with a segment register. This is needed for instructions that must compute an 
address by combining an offset with a segment register. One ASSUME directive can be 
used to associate all the segment registers. For example: . 


ASSUME CS:namel,DS:name2,SS:name3,ES:name4 


where namel, name2, and so on, are the names of the segments. The same result 
can be achieved by having one ASSUME for each register: 


ASSUME CS:namel 
ASSUME DS:name2 
ASSUME SS:name3 
ASSUME ESsenocthing 
ASSUME nothing 


The key word "nothing" can be used to cancel a previous ASSUME directive. 
DB (Define Byte) 


The DB directive is used to allocate memory in byte-sized increments. Look at 
the following examples: 


DATA1 DB 23 
DATA2 DB aSa Sa LOTOO Ls 
DATA3 DB The planet Earth' 


In DATA1 a single byte is defined with initial value 23. DATA2 consists of sev- 
eral values in decimal (45), hex (97H), and binary (10000011B). Finally, in DATA3, the 
DB directive is used to define ASCII characters. The DB directive is normally used to 
define ASCII data. In all the examples above, the address location for each value is 
assigned by the assembler. We can assign a specific offset address by the use of the ORG 
directive. 


DD (Define Doubleword) 


To allocate memory in 4-byte (32-bit) increments, the DD directive is used. Since 
word-sized operands are 16 bits wide (2 bytes) in 80x86 assemblers, a doubleword is 4 
bytes. 


VALUE1 DD ASOS EST7H 
RESULT DD i RESERVE 4-BYTE LOCATION 
DAT4 DD 25000000 


It must be noted that the values defined using the DD directive are placed in mem- 
ory by the assembler in low byte to low address and high byte to high address order. This 
convention is referred to as little endian. For example, assuming that offset address 0020 
is assigned to VALUE] in the example above, each byte will reside in memory as follows: 


—_—$—$————$—$—— eee 
740 


DSR O= (57) 
Dont Sik) 
DERZ = (96) 
DS:23=(04) 


DQ (Define Quadword) 


To allocate memory in 8-byte increments, the DQ directive is used. In the 80x86 
a word is defined as 2 bytes; therefore, a quadword is 8 bytes. 


DAT 64B DQ 5677DD4EE4FF45AH 
DATS8 DQ 10000000000000 


DT (Define Tenbytes) 


To allocate packed BCD data, 10 bytes at a time, the DT directive is used. This 
is widely used for memory allocation associated with BCD numbers. 


DATA DT 399977653419974 


Notice there is no H for the hexadecimal identifier following the number. This is 
a characteristic particular to the DT directive. In the case of other directives (DB, DW, 
DD, DQ), a number with no H at the end, is assumed to be in decimal and will be con- 
verted to hex by the assembler. Remember that the little endian convention is used to 
place the bytes in memory, with the least significant byte going to the low address and the 
most significant byte to the high address. DT can also be used to allocate decimal data if 
"d" is placed after the number: 


DATA DT 65536 ; stores hex FFFF in a 10-byte 
Vocation 


DUP (Duplicate) 


The DUP directive can be used to duplicate a set of data a certain number of times 
instead of having to write it over and over. 


DATA1 DB AO DUD ASE) ; DUPLICATE 39520" HIMES 

DATA2 DW 6 DUBR (555030) ;DUPLICATE 5555H 6 TIMES 

DATA3 DB 10) IDIUIE (C2? )) ;RESERVE 10 BYTES 

DATA4 DB 5) DUE (5 DUP (0)) 7259 BMeaInSs INITPALEZED! TO” ZERO 
DATA5 DB Oe DUE (COy EH) 720 BYTES ALTERNATE 00, FF 


DW (Define Word) 


To allocate memory in 2-byte (16-bit) increments, the DW directive is used. In 
the 80x86 family, a word is defined as 16 bits. 


DATAW 1 DW 5000 
DATAW 2 DW 7F6BH 


Again, in terms of placing the bytes in memory the little endian convention is 
used with the least significant byte going to the low address and the most significant byte 
going to the high address. 


END 


Every program must have an entry point. To identify that entry point the assem- 
bler relies on the END directive. The labels for the entry and end point must match. 


HERE: MOV AX,DATASEG ;ENTRY POINT OF THE PROGRAM 


END HERE ;EXIT POINT OF THE PROGRAM 


pO 


APPENDIX C: ASSEMBLER DIRECTIVES AND NAMING RULES 741 


If there are several modules, only one of them can have the entry point, and the 
name of that entry point must be the same as the name for the END directive as shown 
below: 


;from the main program: 
EXTRN  PROG1:NEAR 


MAIN PRO: MOV AX, DATASG -EHE ENERY SeOmNe 
MOV DS, AX 


CALL PROG1 
END MAIN PRO Patada IE O E 


;from the module PROGI; 
PUBLIC PROGI 


PROGI PROC 

RET RETURN TO THE MAIN 
MODULE 
PROG1 ENDP 

END 7;NO LABEL IS GIVEN 


Notice the following points about the above code: 


1. The entry point must be identified by a name. In the example above the entry point 
is identified by the name MAIN PRO. 

2. The exit point must be identified by the same name given to the entry point, 
MAIN PRO. 

3. Since a given program can have only one entry point and one exit point, all modules 
called (either from main or from the submodules) must have directive END with 
nothing after it. 


ENDP (see the PROC directive) 
ENDS (see the SEGMENT and STRUCT directives) 
EQU (Equate) 


To assign a fixed value to a name, one uses the EQU directive. The assembler will 
replace each occurrence of the name with the value assigned to it. 


FIX VALU EQU 1200 
PORT A EQU 60H 

COUNT EQU 100 

MASK 1 EQU 00001111B 


Unlike data directives such as DB, DW, and so on, EQU does not assign any 
memory storage; therefore, it can be defined at any time and at any place, and can even 
be used within the code segment. 


EVEN 


The EVEN directive forces memory allocation to start at an even address. This is 
useful due to the fact that in 8086 and 286 microprocessors, accessing a 2-byte operand 
located at an odd address takes extra time. The use of the EVEN directive directs the 
assembler to assign an even address to the variable. 


742 


ORG 0020H 
DATA 1 DB 34H 


EVEN 
DATA 2 DW 7F5BH 


The following shows the contents of memory locations: 


DS:0020 = (34) 
DS:0021 = (2 ) 
DS:0022 = (5B) 
DS:0023 = (7F) 


Notice that the EVEN directive caused memory location DS:0021 to be bypassed, 
and the value for DATA_2 is placed in memory starting with an even address. 


EXTRN (External) 


The EXTRN directive is used to indicate that certain variables and names used in 
a module are defined by another module. In the absence of the EXTRN directive, the 
assembler would search for the definition and give an error when it couldn't find it. The 
format of this directive is 


EXTRN namel:typea [ ,name2:typeb] 


where type will be NEAR or FAR if name refers to a procedure, or will be BYTE, 
WORD, DWORD, QWORD, TBYTE if name refers to a data variable. 


;from the main program: 
EXTRN PROG1:NEAR 
PUBLIC DATAI1 


MAIN PRO MOV AX, DATASG 7THE ENTRY POINT 
MOV DS, AX 


CALL PROG1 
END MAIN PRO Plies, EATCIL POTE 


;PROG1 is located in a different file: 
EXTRN DATA1:WORD 
PUBLIC PROG1 

PROG1 BROCE 


MOV BX, DATA] 


RET 7;RETURN TO THE MAIN MODULE 
PROGA ENDP 
END 


Notice that the EXTRN directive is used in the main procedure to identify PROG1 
as a NEAR procedure. This is needed because PROGI is not defined in that module. 
Correspondingly, PROGI is defined as PUBLIC in the module where it is defined. 
EXTRN is used in the PROGI module to declare that operand DATA1, of size WORD, 
has been defined in another module. Correspondingly, DATA 1 is declared as PUBLIC in 
the calling module. 


GROUP 


The GROUP directive causes the named segments to be linked into the same 64K- 
byte segment. All segments listed in the GROUP directive must fit into 64K bytes. This 
can be used to combine segments of the same type, or different classes of segments. An 

eee ee ee ene eee 


APPENDIX C: ASSEMBLER DIRECTIVES AND NAMING RULES 743 


example follows: 


SMALL SYS GROUP DSEG STSECICEPSEG 


The ASSUME directive must be changed to make the segment registers point to 
the group: 


ASSUME CS:SMALL SY¥S,DS:SMALL SYS,SS#SMALL SYS 
The group will be listed in the list file, as shown below: 


Segments and Groups: 


Name Length Align Combine Class 
SIGS 0 Te eee bce GROUP 

AREG a = a se aa ae 0040 PARA NONE 

DTSEG ao © e 0024 PARA NONE 

CDSEG Hoc 6. G@eA < OO5A PARA NONE 
INCLUDE 


When there is a group of macros written and saved in a separate file, the 
INCLUDE directive can be used to bring them into another file. In the program listing 
(.Ist file), these macros will be identified by the symbol "C" (or "+" in some versions of 
MASM) before each instruction to indicate that they are copied to the present file by the 
INCLUDE directive. 


LABEL 


The LABEL directive allows a given variable or name to be referred to by multi- 
ple names. This is often used for multiple definitions of the same variable or name. The 
format of the LABEL directive is 


name LABEL type 


where type may be BYTE, WORD, DWORD, or QWORD. For example, a vari- 
able of name DATA is defined as a word and also needs to be accessed as 2 bytes, as 
shown in the following: 


DATA B LABEL BYTE 
DATA1 DW 25F6H 
MOV AX, DATAL 7 AX=25F 6H 
MOV BL,DATA B ;BL=F6H 
MOV BH, DATA B. +1. ; BH=25H 


The following shows the LABEL directive being used to allow accessing a 32-bit 
data item in 16-bit portions. 


DATA_16 LABEL WORD 

DATDD 4 DD 4387983FH 
MOV AX, DATA 16 7; AX=983FH 
MOV Dx, Dat Age G2 ; DX=4387H 


The following shows its use in a JMP instruction to go to a different code seg- 
ment. 


JMP PROG A 


EROGTA LABEL FAR 


ENET: MOV AL, 12H 
OUT PORT, AL 


In the program above the addresses assigned to the names "PROG A" and 
"INITI" are exactly the same. The same function can be achieved by the following: 


JMP FARPTR INITI 
LENGTH 


The LENGTH operator returns the number of items defined by a DUP operand. 
See the SIZE directive for an example. 


OFFSET 


_ To access the offset address assigned to a variable or a name, one uses the OFF- 
SET directive. For example, the OFFSET directive was used in the following example to 
get the offset address assigned by the assembler to the variable DATA: 


ORG 5600H 
DATA1 DW 2345H 


MOV SI,OFFSET DATA1 ;SI=OFFSET OF DATA] = 5600H 


Notice that this has the same result as "LEA SI,DATAI". 
ORG (Origin) 


The ORG directive is used to assign an offset address for a variable or name. For 
example, to force variable DATA to be located starting at offset address 0020, one would 
write 


ORG 0020H 
DATA1 DW 41F2H 


This ensures the offset addresses of 0020 and 0021 with contents 0020H = (F2) 
and 0021H = (41). 


PAGE 


The PAGE directive is used to make the ".Ist" file print in a specific format. The 
format of the PAGE directive is 


PAGE [ lines] ,[ columns] 


The default listing (meaning that no PAGE directive is coded) will have 66 lines 
per page with a maximum of 80 characters per line. This can be changed to 60 and 132 
with the directive "PAGE 60,132". The range for number of lines is 10 to 255 and for 
columns is 60 to 132. A PAGE directive with no numbers will generate a page break. 


PROC and ENDP (Procedure and End Procedure) 


Often, a group of Assembly language instructions will be combined into a proce- 
dure so that it can be called by another module. The PROC and ENDP directives are used 
to indicate the beginning and end of the procedure. For a given procedure the names 
assigned to PROC and ENDP must be exactly the same. 


namel PROC [attribute] 
name1 ENDP 

There are two choices for the attribute of the PROC: NEAR or FAR. If no attrib- 
ute is given, the default is NEAR. When a NEAR procedure is called, only IP is saved 
since CS of the called procedure is the same as the calling program. Ifa FAR procedure 
is called, both IP and CS are saved since the code segment of the called procedure is dif- 
ferent from that of the calling program. 


eee errr eee ee re ee en 
APPENDIX C: ASSEMBLER DIRECTIVES AND NAMING RULES 745 


PTR (Pointer) 


The PTR directive is used to specify the size of the operand. Among the options 
for size are BYTE, WORD, DWORD, and QWORD. This directive is used in many dif- 
ferent ways, the most common of which are explained below. 


1. PTR can be used to allow an override of a previously defined data directive. 


DATA1 DB 23H, 7FH, 99H, OB2H 
DATA2 DW 67F1H 
DATA3 DD 22229999H 
MOV AX, WORD PTR DATAI1 7; AX=7F23 


MOV BX; WORDS PLR DATAM 202) 5); Bane Zoo 


Although DATA1 was initially defined as DB, it can be accessed using the 
WORD PTR directive. 


MOV AL, BYTE PTR DATA2 ;AL=F1H 


In the above code, notice that DATA2 was defined as WORD but it was accessed 
as BYTE with the help of BYTE PTR. If this had been coded as "MOV AL,DATA2", it 
would generate an error since the sizes of the operands do not match. 


MOV AX, WORD PTR DATA3 ; AX=9999H 
MOV DX, WORD PTR DATA3 + 2 ;DX=2222H 


DATA3 was defined as a 4-byte operand but registers are only 2 bytes wide. The 
WORD PTR directive solved that problem. 


2. The PTR directive can be used to specify the size of a directive in order to help the 
assembler translate the instruction. 


INC { DI] ;will cause an error 


This instruction was meant to increment the contents of the memory location(s) 
pointed at by [DI]. How does the assembler know whether it is a byte operand, word 
operand, or doubleword operand? Since it does not know, it will generate an error. To 
correct that, use the PTR directive to specify the size of the operand as shown next. 


INC BITE PTR [| SI] increment a byte pointed by SI 

or 

INC WORD PTR [ SI] sinerement a word pointed by SI 

or 

INC DWORD PTR [ SI] ;increment a doubleword pointed by SI 


3. The PTR directive can be used to specify the distance of a jump. The options for the 
distance are FAR and NEAR. 


JMP PARSER EENI ; ensuresha S Dy tem Ins tiie tare 
TEIN AC AWE MOV AX, 1200 


See the LABEL directive to find out how it can be used to achieve the same result. 


746 


PUBLIC 


To inform the assembler that a name or symbol will be referenced by other mod- 
ules, it is marked by the PUBLIC directive. If a module is referencing a variable outside 
itself, that variable must be declared as EXTRN. Correspondingly, in the module where 
the variable is defined, that variable must be declared as PUBLIC in order to allow it to 
be referenced by other modules. See the EXTRN directive for examples of the use of both 
EXTRN and PUBLIC. 


SEG (Segment Address) 


The SEG operator is used to access the address of the segment where the name 


has been defined. 
DATA] DW 2341H 
MOV AX,SEG DATA1 ;AX=SEGMENT ADDRESS OF DATA1 


This is in contrast to the OFFSET directive, which accesses the offset address 
instead of the segment. 


SEGMENT and ENDS 


In full segment definition these two directives are used to indicate the beginning 
and the end of the segment. They must have the same name for a given segment defini- 
tion. See the following example: 


DATSEG SEGMENT 

DATA1 DB 2 EH 

DATA2 DW 1200 
DATA3 DD 999999°99R 
DATSEG ENDS 


There are several options associated with the SEGMENT directive, as follows: 


namel SEGMENT [align] [ combine] [class] 


namel ENDS 


ALIGNMENT: When several assembled modules are linked together, this indi- 
cates where the segment is to begin. There are many options, including PARA (paragraph 
= 16 bytes), WORD, and BYTE. If PARA is chosen, the segment starts at a hex address 
divisible by 10H. PARA is the default alignment. In this alignment, if a segment for a 
module finished at 00024H, the next segment will start at address 00030H, leaving from 
00025 to 0002F unused. If WORD is chosen, the segment is forced to start at a word 
boundary. In BYTE alignment, the segment starts at the next byte and no memory is wast- 
ed. There is also the PAGE option, which aligns segments along the 100H (256) byte 
boundary. While all these options are supported by many assemblers, such as MASM and 
TASM, there is another option supported only by assemblers that allow system develop- 
ment. This option is AT. The AT option allows the program to assign a physical address. 
For example, to burn a program into ROM starting at physical address F0000, code 


ROM CODE SEGMENT AT FOOOH 


Due to the fact that option AT allows the programmer to specify a physical 
address that conflicts with DOS's memory management responsibility, many assemblers 
such as MASM will not allow option AT. 

COMBINE TYPE: This option is used to merge together all the similar segments 
to create one large segment. Among the options widely used are PUBLIC and STACK. 
PUBLIC is widely used in code segment definitions when linking more than one module. 
This will consolidate all the code segments of the various modules into one large code 
segment. If there is only one data segment and that belongs to the main module, there is 
no need to define it as PUBLIC since no other module has any data segment to combine 
with. However, if other modules have their own data segments, it is recommended that 

Neen ee aa, 


APPENDIX C: ASSEMBLER DIRECTIVES AND NAMING RULES 747 


they be made PUBLIC to create a single data segment when they are linked. In the 
absence of that, the linker would assume that each segment is private and they would not 
be combined with other similar segments (codes with codes and data with data). Since 
there is only one stack segment, which belongs to the main module, there is no need to 
define it as PUBLIC. The STACK option is used only with the stack segment definition 
and indicates to the linker that it should combine the user's defined stack with the system 
stack to create a single stack for the entire program. This is the stack that is used at run 
time (when the CPU is actually executing the program). 

CLASS NAME: Indicates to the linker that all segments of the same class should 
be placed next to each other by the LINKER. Four class names commonly used are 
‘CODE’, 'DATA', ‘STACK’, and 'EXTRA'. When this attribute is used in the segment def- 
inition, it must be enclosed in single quotes in order to be recognized by the linker. 


SHORT 


In a direct jump such as "JMP POINT_A", the assembler has to choose either the 
2-byte or 3-byte format. In the 2-byte format, one byte is the opcode and the second byte 
is the signed number displacement value added to the IP of the instruction immediately 
following the JMP. This displacement can be anywhere between —128 and +127. A neg- 
ative number indicates a backward JMP and a positive number a forward JMP. In the 3- 
byte format the first byte is the opcode and the next two bytes are for the signed number 
displacement value, which can range from —32,768 to 32,767. When assembling a pro- 
gram, the assembler makes two passes through the program. Certain tasks are done in the 
first pass and others are left to the second pass to complete. In the first pass the assem- 
bler chooses the 3-byte code for the JMP. After the first pass is complete, it will know the 
target address and fill it in during the second pass. If the target address indicates a short 
jump (less than 128) bytes away, it fills the last byte with NOP. To inform the assembler 
that the target address is no more than 128 bytes away, the SHORT directive can be used. 
Using the SHORT directive makes sure that the JMP is a 2-byte instruction and not 3-byte 
with 1 byte as NOP code. The 2-byte JMP requires 1 byte less memory and is executed 
faster. 


SIZE 


The size operator returns the total number of bytes occupied by a name. The three 
directives LENGTH, SIZE, and TYPE are somewhat related. Below is a description of 
each one using the following set of data defined in a data segment: 


DATA1 DQ E 
DATA2 DW B 
DATA3 DB 20 DUP (?) 
DATA4 DW LOO DUP (2) 
DATAS DD LODE R) 


TYPE allows one to know the storage allocation directive for a given variable by 
providing the number of bytes according to the following table: 


bytes 

I DB 
2 DW 
4 DD 
8 DQ 
10 DAN 


For example: 


MOV BX, TYPE DATA2 ; BX=2 
MOV DX, TYPE DATA1 ; DX=8 
MOV AX, TYPE DATA3 7; AX=1 
MOV Cx, TYPE DATAS ; CX=4 


When a DUP is used to define the number of entries for a given variable, the 
LENGTH directive can be used to get that number. 


——— ees 
748 


MOV CX, LENGTH DATA4 7CX=64H (100 DECIMAL) 
MOV AX, LENGTH DATA3 7; AX=14H (20 DECIMAL) 
MOV DX, LENGTH DATAS ;DX=0A (10 DECIMAL) 


a If the defined variable does not have any DUP in it, the LENGTH is assumed to 
& il 


MOV BX, LENGTH DATA1 ; BX=1 


SIZE is used to determine the total number of bytes allocated for a variable that 
has been defined with the DUP directive. In reality the SIZE directive basically provides 
the product of the TYPE times the LENGTH. 


MOV DX, SIZE DATA4 ;DX=C8H=Z00 (100 x 2=200) 
MOV CX, SIZE DATAS ;CX=28H=40 (4 x 10=40) 
STRUC (Structure) 


The STRUC directive indicates the beginning of a structure definition. It ends 
with an ENDS directive, whose label matches the STRUC label. Although the same 
mnemonic ENDS is used for end of segment and end of structure, the assembler knows 
which is meant by the context. A structure is a collection of data types that can be 
accessed either collectively by the structure name or individually by the labels of the data 
types within the structure. A structure type must first be defined and then variables in the 
data segment may be allocated as that structure type. Looking at the following example, 
the data directives between STRUC and ENDS declare what structure ASC_AREA looks 
like. No memory is allocated for such a structure definition. Immediately below the 
structure definition is the label ASC_INPUT, which is declared to be of type ASC_AREA. 
Memory is allocated for the variable ASC_INPUT. Notice in the code segment that 
ASC_INPUT can be accessed either in its entirety or by its component parts. It is 
accessed as a whole unit in "MOV DX,OFFSET ASC_INPUT". Its component parts are 
accessed by the variable name followed by a period, then the component's name. For 
example, "MOV BL,ASC_INPUT.ACT_LEN" accesses the actual length field of 


ASC_INPUT. 

¿from the data segment: 

ASC_AREA STRUC ;deErnes struc fer string 
input 

MAX LEN DB 6 ; maximum length of input string 
ACT LEN DB 2 ; actual length ofm inputi sering 
ASC NUM DB CEDURES) A aliqyojtlie SyicieslsaKoy 

ASC AREA ENDS ;end struc definition 

ASC INPUT ASC_AREA <> allocates memory for struc 


from thel code segment: 


GET ASC: MOV AH, OAH 


MOV DX, OFESEDT ASC INPUT 
INT 21H 


MOV ST7OFESET ASC INPUT.AS® NUM ;ST points to ASCIT num 
MOV BL, ASC INPUT .ACT LEN 7BL holds string length 


TITLE 


The TITLE directive instructs the assembler to print the title of the program on 
top of each page of the "Ist" file. What comes after the TITLE pseudo-instruction is up 


M 
APPENDIX C: ASSEMBLER DIRECTIVES AND NAMING RULES 749 


to the programmer, but it is common practice to put the name of the program as stored on 
the disk right after the TITLE pseudo-instruction and then a brief description of the func- 
tion of the program. Whatever is placed after the TITLE pseudo-instruction cannot be 
more than 60 ASCII characters (letters, numbers, spaces, punctuation). 


TYPE 


The TYPE operator returns the number of bytes reserved for the named data 
object. See the SIZE directive for examples of its use. 


SECTION C.2: RULES FOR LABELS AND RESERVED 
NAMES 


Labels in 80x86 Assembly language for MASM 5.1 and higher must follow these 
rules: 


1. Names can be composed of: 
alphabetic characters: A-Z and a-z 
digits: 0-9 
special characters: ETIN mN Gy nee i es 


2. Names must begin with an alphabetic or special character. Names cannot begin with 
a digit. 


3. Names can be up to 31 characters long. 
4. The special character "." can only be used as the first character. 


5. Uppercase and lowercase are treated the same. "NAME1" is treated the same as 
"Namel" and "namel". 


Assembly language programs have five types of labels or names: 


1. Code labels, which give symbolic names to instructions so that other instructions 
(such as jumps) may refer to them 


2. Procedure labels, which assign a name to a procedure 
3. Segment labels, which assign a name to a segment 
4. Data labels, which give names to data items 


5. Labels created with the LABEL directive 
Code labels 


These labels will be followed by a colon and have the type NEAR. This enables 
other instructions within the code segment to refer to the instruction. The labels can be 
on the same line as the instruction: 


ADD LP: ADD AL,[ BX] ;label is on same line as the instruction 
LOOP ADD LP 
or on a line by themselves: 


ADD) ee riabel is on a line by iteeme 


—— eee 


ADD AL,[ BX] Pee eters to this instruction 


LOOP ADD LP 
Procedure labels 
These labels assign a symbolic name to a procedure. The label can be NEAR or 
FAR. When using full segment definition, the default type is NEAR. When using sim- 


plified segment definition, the type will be NEAR for compact or small models but will 


be FAR for medium, large, and huge models. For more information on procedures, see 
PROC in Section C.1. 


Segment labels 


These labels give symbolic names to segments. The name must be the same in the 
SEGMENT and ENDS directives. See SEGMENT in Section C.1 for more information. 


Example: 

DAT SG SEGMENT 

SUM DW ? 
DAT SG ENDS 

Data labels 


These labels give symbolic names to data items. This allows them to be accessed 
by instructions. Directives DB, DW, DD, DQ, and DT are used to allocate data. 


Examples: 

DATA1 DB 43H 
DATA2 DB F2H 
SUM DW ? 


Labels defined with the LABEL directive 


The LABEL directive can be used to redefine a label. See LABEL in Section C.1 
for more information. 


Reserved Names 


The following is a list of reserved words in 80x86 Assembly language program- 
ming. These words cannot be used as user-defined labels or variable names. 


Register Names: 


AH AL AX BH BL BP BX CH CL CS CX DH 
pu Du DS D as Sr SSP) CSS 

Instructions: 
AAA AAD AAM AAS ADC ADD 
AND CATE CBW CHE CED Ci 
CMC CMP CMPS CWD DAA DAS 
DEC DIV ESC HLT IDIV IMUL 
IN INC INT INTO IRET TA 
JAE JB JBE JEZ JE JE 
JGE JL JLE JMP JNA JNAE 
JNB JNBE JNE ONG JNGE JNL 
JNLE JNO JNP JNS JNZ JO 
JP JPE JPO JS JZ LAHF 
LDS LEA LES LOCK LODS LOOP 
LOOPE LOO PNE LOO PNZ LOOPZ MOV MOVS 
MUL NEG NIL NOP NOT OR 
OUT POP POPF PUSH PUSHF RCL 
RCR REP REPE REPNE REPNZ REPZ 


Cree eer rere reer reer ee ee NNN ET _______________  _ x — — —— 
APPENDIX C: ASSEMBLER DIRECTIVES AND NAMING RULES 751 


ROL 
SCAS 
STOS 
XOR 


ROR 
SHL 
SUB 


SAHE 
SHR 
gS 


Assembler operators and directives: 


S * 


DW 
ENDM 
EXTRN 
HIGH 
IFIDN 


INCLUDELIB 


LINE 
MOD 
ORG 
QWORD 
SHORT 
THIS 
12816 
12086 
. DATA 
. ERRDEF 
. ERRNZ 
. MODEL 
OAC K 


732 


+ = 
ASSUME 
DF 
DWORD 
ENDS 
FAR 
TE 
IFNB 
URP 
LOCAL 
NAME 
PAGE 
RECORD 
SIZE 
TITLE 
pee Om 
8087 
. DATA? 
TERRDTE 
. FARDATA 
SOUT 
. TFCOND 


BYTE 
DOSSEG 
DUP 

EQ 
FWORD 
TEB 
IFNDEF 
IRPC 
LOW 

NE 
EROG 
REPT 
STACK 
TYPE 
on 

- ALPHA 
. ERR 

. ERRE 
. FARDATA? 
-RADIX 
SEYRE 


/ 


GE 
TEDEE 
EA 
LABEL 
IE 
NEAR 
ENR 
REPTRD 
STRUC 
WIDTH 
23316 

- CODE 
-ERRI 
- ERRIDN 
- LALL 
. SALL 
-XALL 


COMMENT 
DS 

END 
EVEN 
GROUP 
TEDER 
IF2 

LE 
MACRO 
NOTHING 
PUBLIC 
SEG 
SUBTTL 
WORD 
~386P 

- CONST 
. ERR2 

. ERRNB 
. LFCOND 
- SEO 

> ACRE 


DB 

DME 
ENDIF 
EXITM 
GT 

TEITE 
INCLUDE 
LENGTH 
MASK 
OFFSET 
PURGE 
SEGMENT 
TRY TE) 
oe 
.387 
CREE 

. ERRB 

. BRRNDEF 
le See 

. SFCOND 
SUST 


APPENDIX D 


INTERRUPT CALLS AND 
LEGACY SOFTWARE 


OVERVIEW 


This appendix lists many of the interrupt calls for INT 
21H, INT 10H, and so on, which are used primarily for input, 
output, and file and memory management. 


SECTION D.1: 21H INTERRUPTS 


AH Function of INT 21H 


00 Terminate the program 


Additional Call Registers Result Registers 
CS = segment address of None 


PSP (program segment prefix) 


Note: Files should be closed previously or data may be lost. 


01 Keyboard input with echo 


Additional Call Registers Result Registers > 


None AL = input character 


Note: Checks for Ctrl-Break. 


02 Output character to monitor 


Additional Call Registers Result Registers 


DL = character to be displayed None 


03 Asynchronous input from auxiliary device (serial device) 


Additional Call Registers Result Registers 


None AL = input character 


na Asynchronous character output 


Additional Call Registers Result Registers 


DL = character to be output None 


05 Output character to printer 


Additional Call Registers Result Registers 

DL = character to be printed None 
06 Console I/O 

Additional Call Registers Result Registers 

DL = OFFH if input AL = 0H if no character available 

or character to be = character that was input, if 

displayed, if output input successful 


Note: If input, ZF is cleared and AL will have the character. ZF is set if 
input and no character was available. 


754 


A Function 


07 Keyboard input without echo 


Additional Call Registers Result Registers 
None AL = input character 


Note: Does not check for Ctrl-Break. 


08 Keyboard input without echo 


Additional Call Registers Result Registers 
None AL = input character 


Note: Checks for Ctrl-Break. 
09 String output 


Additional Call Registers Result Registers 
DS:DX = string address None 


Note: Displays characters beginning at address until a '$' (ASCII 36) is 
encountered. 


0A String input 


Additional Call Registers Result Registers 
DS:DX = address at which None 


to store string 


Note: Specify the maximum size of the string in byte | of the buffer. DOS 
will place the actual size of the string in byte 2. The string begins in byte 3. 


0B Get keyboard status 


Additional Call Registers Result Registers 
None AL = 00 if no character waiting 


= OFFH if character waiting 


Note: Checks for Ctrl-Break. 
oc Reset input buffer and call keyboard input function 


Additional Call Registers Result Registers 
AL = keyboard function number None 


01H, 06H, 07H, 08H or OAH 


Note: This function waits until a character is typed in. 


p 
APPENDIX D: INTERRUPT CALLS AND LEGACY SOFTWARE 155 


H Function 


0D Reset disk 


Additional Call Registers Result Registers 
None None 


Note: Flushes DOS file buffers but does not close files. 
0E Set default drive 


Additional Call Registers Result Registers 

DL = code for drive AL = number of logical drives 

(0 =A, 1 =B, 2 =C, etc.) in system 
OF Open file 

Additional Call Registers Result Registers 

DS:DX = address of FCB AL = 00 if successful 


= OFFH if file not found 


Note: Searches current directory for file. If found, FCB is filled. 


SECTION D.2: MOUSE INTERRUPTS 33H 


AX Function of INT 33H 


00 Initialize the mouse 
Additional Call Registers Result Registers 
None AX = 0H if mouse not available 


= FFFFH if mouse available 
BX = number of mouse buttons 


Note: This function is called only once to initialize the mouse. If mouse sup- 
port is present, AX = FFFFH, and the mouse driver is initialized, the mouse 
pointer is set to the center of the screen and concealed. 


01 Display mouse pointer 
Additional Call Registers Result Registers 
None None 


Note: This function displays the mouse pointer and cancels any exclusion 
area. 


02 Conceal mouse pointer 


Additional Call Registers Result Registers 


None None 


Note: This function hides the mouse pointer but the mouse driver monitors 
its position. Most programs issue this command before they terminate. 


eee 
756 


AX Function 


03 Get mouse location and button status 
Additional Call Registers Result Registers 
None BX = mouse button status 


bit 0 -- left button 

bit 1 -- right button 

bit 2 — center button 

= 0 if up; = 1 if down 
CX = horizontal position 
DX = vertical position 


Note: The horizontal and vertical coordinates are returned in pixels. 


04 Set mouse pointer location 


Additional Call Registers Result Registers 


CX = horizontal position None 
DX = vertical position 


Note: The horizontal and vertical coordinates are in pixels. Will display the 
mouse pointer only within set limits; will not display in exclusion areas. 


05 Get button press information 


Additional Call Registers Result Registers 
BX = button: 0 for left; AX = button status 
l for right; 2 for center bit 0 -- left button 


bit 1 — right button 

bit 2 -- center button 

= 0 if up; = 1 if down 
BX = button press count 
CX = horizontal position 
DX = vertical position 


Note: This returns the status of all buttons as well as the number of presses 
for the button indicated in BX when called. The position of the mouse pointer 
is given in pixels and represents the position at the last button press. 


ss. ee eeEtuEe 
APPENDIX D: INTERRUPT CALLS AND LEGACY SOFTWARE T 


AX Function 


06 Get button release information 
Additional Call Registers Result Registers 
BX = button: 0 for left; AX = button status 
1 for right; 2 for center bit 0 -- left button 
bit 1 -- right button 
bit 2 — center button 
= 0 if up; = 1 if down 
BX = button release count 
CX = horizontal position 
DX = vertical position 


Note: This returns the status of all buttons as well as the number of releases 
for the button indicated in BX when called. The position of the mouse pointer 
is given in pixels and represents the position at the last button release. 


07 Set horizontal limits for mouse pointer 


Additional Call Registers Result Registers 


CX = minimum horizontal position None 
DX = maximum horizontal position 


Note: This sets the horizontal limits (in pixels) for the mouse pointer. A fter 
this call, the mouse will be displayed within these limits. 


08 Set vertical limits for mouse pointer 


Additional Call Registers Result Registers 


CX = minimum vertical position None 
DX = maximum vertical position 


Note: This sets the vertical limits (in pixels) for the mouse pointer. After this 
call, the mouse will be displayed within these limits. 


10 Set mouse pointer exclusion area 


Additional Call Registers Result Registers 


CX = upper left horizontal coordinate None 
DX = upper left vertical coordinate 

SI = lower right horizontal coordinate 

DI = lower right vertical coordinate 


Note: This defines an area in which the mouse pointer will not display. An 
exclusion area can be cancelled by calling functions 00 or 01. 


758 


AX Function 


24 Get mouse information 


Additional Call Registers Result Registers 
None BH = major version 


BL = minor version 
CH = mouse type 
CL = IRQ number 
Note: This returns the version number (e.g., version 7.5: BH = 7, BL = 5). 


Mouse type: 1 for bus; 2 for serial; 3 for InPort; 4 for PS/2; 5 for HP; IRQ =0 
for PS/2; otherwise = 2, 3, 4, 5 or 7. 


SECTION D.3: INT 10H 


AH Function 


00 Set video mode 


Additional Call Registers Result Registers 
AL = video mode None 


See Table D-2 for a list of available video modes and their definitions. 


01 Set cursor type 


Additional Call Registers Result Registers 
CH = beginning line of cursor None 
(bits 0—4) 
CL = ending line of cursor 
(bits 0-4) 
Note: All other bits should be set to zero. The blinking of the cursor is hardware 
controlled. 
02 Set cursor position 
Additional Call Registers Result Registers 
BH = page number None 
DH = row 


DL = column 


Note: When using graphics modes, BH must be set to zero. Text coordinates of 
the upper left-hand corner will be (0,0). 


——— ss ____._...___ =e 


APPENDIX D: INTERRUPT CALLS AND LEGACY SOFTWARE 759 


H Function 


03 Read cursor position and size 
Additional Call Registers Result Registers 
BH = page number CH = beginning line of cursor 
CL = ending line of cursor 
DH = row 


DL = column 


Note: When using graphics modes, BH must be set to zero. 


04 Read light pen position 


Additional Call Registers Result Registers i 
None AH = 0 if light pen not triggered 


= ] if light pen triggered 

BX = pixel column 

CH = pixel row (modes 04H—06H) 
CX = pixel row (modes 0DH—13H) 
DH = character row 

DL = character column 


05 Select active display page 


Additional Call Registers ___ Result Registers 
AL = page number None 


(see Table D-1 below) 
Table D-1: Display Pages for Different Modes and Adapters 


Mode G Pages Adapters 
00H 0-7 VGA 
01H 0-7 VGA 
02H 0-3 CGA 
0-7 VGA 
03H 0-3 CGA 
0-7 VGA 
07H 0-7 VGA 
0DH 0-7 VGA 
OEH 0-3 VGA 
OFH 0-1 VGA 
10H 0-1 VGA 


All other mode-adapter combinations support only one page. 


760 


AH Function 


06 Scroll window up 


Additional Call Registers Result Registers 
AL = number of lines to scroll None 


BH = display attribute 

CH = y coordinate of top left 

CL = x coordinate of top left 

DH = y coordinate of lower right 
DL = x coordinate of lower right 


Note: If AL = 0, the entire window is blank. Otherwise, the screen will be 


scrolled upward by the number of lines in AL. Lines scrolling off the top of the screen are 
lost, and blank lines are scrolled in at the bottom according to the attribute in BH. 


07 Scroll window down 


Additional Call Registers Result Registers 
AL = number of lines to scroll None 


BH = display attribute 

CH = y coordinate of top left 

CL = x coordinate of top left 

DH = y coordinate of lower right 
DL = x coordinate of lower right 


Note: If AL = 0, the entire window is blank. Otherwise, the screen will be 
scrolled down by the number of lines in AL. Lines scrolling off the bottom of the screen 
are lost, and blank lines are scrolled in at the top according to the attribute in BH. 


08 Read character and attribute at cursor position 
Additional Call Registers Result Registers 
BH = display page 
AH = attribute byte 
AL = ASCII character code 


09 Write character and attribute at cursor position 


Additional Call Registers Result Registers 
AL = ASCII character code None 


BH = display page 
BL = attribute 
CX = number of characters to write 


Note: Does not update cursor position. Use interrupt 10 Function 2 to set cursor 
position. 


ee ee ere 
APPENDIX D: INTERRUPT CALLS AND LEGACY SOFTWARE 761 


H Function 


0A Write character at cursor position 


Additional Call Registers Result Registers 
AL = ASCII character code None 


BH = display page 
BL = graphic color 
CX = number of characters to write 


Note: Writes character(s) using existing video attribute. Does not update 
cursor position. Use interrupt 10 Function 2 to set cursor position. 


0B Set color palette 


Additional Call Registers Result Registers 
BH = 00H to set border or None 


background colors 
= 01H to set palette 
BL = palette/color 


Note: If BH = 00H and in text mode, this function will set the border color 
only. If BH = 00H and in graphics mode, this function will set background and border 
colors. If BH = 01H, this function will select the palette. In 320 x 200 four-color graph- 
ics, palettes 0 and | are available: 


Pixel Colors for Palettes 0 and 41 


Pixel Palette 0 Palette 1 
0 background background 
il green cyan 
2 red magenta 
3 brown/yellow white 


0c Write pixel 


Additional Call Registers Result Registers 


AL = pixel value None 
CX = pixel column 

DX = pixel row 

BH = page 


Note: Coordinates and pixel value depend on the current video mode. Setting bit 
7 of AL causes the pixel value in AL to be XORed with the current value of the pixel. 


oD Read pixel 


Additional Call Registers Result Registers 
CX = pixel column AL = pixel value 
DX = pixel row 

BH = page 


0E TTY character output 


Additional Call Registers Result Registers 


AL = character None 
BH = page 
BL = foreground color 


eee 
762 


____ Note: Writes a character to the display and updates cursor position. TTY mode 
indicates minimal character processing. ASCII codes for bell, backspace, linefeed, and 
carriage return are translated into the appropriate actions. 


AH AL Function 


OF XX Get video mode 


Additional Call Registers Result Registers 
None AH = width of screen in characters 


AL = video mode 
BH = active display page 


Note: See Table D-2 for a list of possible video modes. 
10 00 Subfunction 00H: set palette register to color correspondence 


Additional Call Registers Result Registers 
AL = 00H None 
BH = color 


CL = palette register (00H to OFH) 
10 01 Subfunction 01H: set border color 


Additional Call Registers Result Registers 
AL =01H None 


BH = border color 
10 02 Subfunction 02H: set palette and border 


Additional Call Registers Result Registers 
AL = 02H None 
ES:DX = address of color list 

13 Write string 
Additional Call Registers Result Registers 
AL = write mode None 


= 00H, attribute in BL, cursor not moved 
= 01H, attribute in BL, cursor moved 
= 02H, attributes follow char, cursor not moved 
= 03H, attributes follow char, cursor moved 
ES:BP = address of string 
CX = character count 
DH = initial row position 
DL = initial column position 
BH = page 


Note: For AL = 00 and 01, the string consists of characters only, which will 
all be displayed with the attribute in BL. For AL = 02 and 03, the data is stored with the 
attributes (char, attrib, char, attrib, and so on). 


ee ee reer eee eee —————————————_ 


APPENDIX D: INTERRUPT CALLS AND LEGACY SOFTWARE 763 


Table D-2: Video Modes and Their Definition 


Char Text/ Max Buffer 

AL Pixels Characters box graph Colors Adapter pages start 

OOH 320 x 200 40 x 25 8 x 8 text Lor CGA 8 B8000h 

320 x 350 40 x 25 8 x 14 text 16 * EGA 8 B8000h 

360 x 400 40 x 25 9x16 text lOi VGA 8 B8000h 

320 x 400 40 x 25 8x16 text l6* MCGA 8 B8000h 

OIH 320 x 200 40 x 25 8x8 text 16 CGA 8 B8000h 

320 x 350 40 x 25 8 x 14 text 16 EGA 8 B8000h 

360 x 400 40 x 25 9 x 16 text 6 VGA 8 B8000h 

320 x 400 40 x 25 8 x 16 text 6 MCGA 8 B8000h 

02H 640 x 200 80 x 25 8x8 text lor CGA 8 B8000h 

640 x 350 80 x 25 8 x 14 text 16 * EGA 8 B8000h 

720 x 400 80 x 25 9 x 16 text 16* VGA 8 B8000h 

640 x 400 80 x 25 8 x 16 text le MCGA 8 B8000h_ 

03H 640 x 200 80 x 25 8x8 text 16 CGA 8 B8000h 

640 x 350 80 x 25 8 x 14 text 16 EGA 8 B8000h 

720 x 400 80 x 25 9 x 16 text 16 VGA 8 B8000h 

640 x 400 80 x 25 8 x 16 text 16 MCGA 8 B8000h 

04H 320 x 200 40 x 25 8 x 8 graph 4 CGA 1 B8000h 

320 x 200 40 x 25 8x8 graph 4 EGA 1 B8000h 

320 x 200 40 x 25 8x8 graph 4 VGA 1 B8000h 

320 x 200 40 x 25 8x8 graph 4 MCGA 1 B8000h 

OSH 320 x 200 40 x 25 8x8 graph Ae CGA l B8000h 

320 x 200 40 x 25 8x8 graph 4* EGA l B8000h 

320 x 200 40 x 25 8x8 graph 4* VGA i B8000h 

320 x 200 40 x 25 8x8 graph 4* MCGA 1 B8000h 

06H 640 x 200 80 x 25 8x8 graph 2 CGA 1 B8000h 

640 x 200 80 x 25 8x8 graph 2 EGA 1 B8000h 

640 x 200 80 x 25 8x8 graph 2 VGA jl B8000h 

640 x 200 80 x 25 8x8 graph 2 MCGA 1 B8000h 

07H 720 x 350 80 x 25 9x 14 text mono MDA 8 B0000h 

720 x 350 80 x 25 9x 14 text mono EGA 4 B0000h 

720 x 400 80 x 25 9x 16 text mono VGA 8 B0000h 
08H reserved 
09H reserved 
OAH reserved 
OBH reserved 
OCH reserved 

ODH 320 x 200 40 x 25 8 x 8 graph 16 EGA 2/4 A0000h 

320 x 200 40 x 25 8 x 8 graph 16 VGA 8 A0000h 

OEH 640 x 200 80 x 25 8 x 8 graph 16 EGA 1/2 A0000h 

640 x 200 80 x 25 8x8 graph 16 VGA 4 A0000h 

OFH 640 x 350 80 x 25 9x14 graph mono EGA 1 A0000h 

640 x 350 80 x 25 8x14 graph mono VGA 2 A0000h 

10H 640 x 350 80 x 25 8x14 graph 4 EGA 1/2 A0000h 

640 x 350 80 x 25 8x14 graph 16 VGA 2 A0000h 

11H 640 x 480 80 x 30 8x16 graph 2 VGA fi A0000h 

640 x 480 80 x 30 8x16 graph 2 MCGA 1 A0000h 

12H 640 x 480 80 x 30 8x16 graph 16 VGA i A0000h 

13H 320 x200 40 x 25 8 x 8 graph 256 VGA l A0000h 

320 x 200 40 x 25 8x8 graph 256 MCGA 1 A0000h 


* color burst off 


eee 
764 


SECTION D.4: INT 12H 


Get conventional memory size 


Call Registers Result Registers 
None AX = memory size (KB) 


Note: Returns amount of conventional memory. 


SECTION D.5: INT 14H 


AH Function 


00 Initialize COM port 


Additional Call Registers Result Registers 
AL = parameter (see below) AH = port status (see below) 
DX = port number (0 if COM1, AL = modem status (see below) 


1 if COM2, etc.) 
Note 1: The parameter byte in AL is defined as follows: 


7 5 4 2 1 Indicates 
Seer Baud rate (000=110, 001=150, 
010=300, 011=600, 100=1200, 
101=2400, 110=4800, 111=9600) 
He X Parity (Ol=odd, l1l=even, x0=none) 
x Stop bats (0i l = 2) 
x x Word length (10=7 bits, 11=8 bits) 


Note 2: The port status returned in AH is defined as follows: 


76-5 4 2 1 0 Indicates 
1 Timed-out 
1 Transmit shift register empty 
1 Transmit holding register empty 
i Break detected 

il Framing error detected 

ii Parity error detected 

i: Overrun error detected 

i Received data ready 


Note 3: The modem status returned in AL is defined as follows: 


7654321 0 Indicates 
1 Received line signal detect 
il Ring indicator 
il DSR (data set ready) 
il (Cws ((elleeie to Send) 
1 Change in receive line signal detect 

1 Trailing edge ring indicator 

il Change in DSR status 

im Giange in CLS status 


_— eee rrr ae 


APPENDIX D: INTERRUPT CALLS AND LEGACY SOFTWARE 765 


H Function 


01 Write character to COM port 


Additional Call Registers Result Registers 


AL = character AH bit 7 = 0 if successful, 1 if not 
DX = port number (0 if COM1, AH bits 0—6 = status if successful 
1 if COM2, etc.) AL = character 


Note: The status byte in AH, bits 0—6, after the call is as follows: 


CMO mS me alee Indicates 
dl Transmit shift register empty 
1 Transmit holding register empty 
1 Break detected 
i Framing error detected 
Il Parity error detected 
I Overrun error detected 
ll Receive data ready 


02 Read character from COM port 


Additional Call Registers Result Registers 
DX = port number (0 if COM1, AH bit 7 = 0 if successful, 1 if not 
1 if COM2, etc.) AH bits 0—6 = status if successful 


AL = character read 


Note: The status byte in AH, bits 1—4, after the call is as follows: 


H-3 2 Indicates 
i Break detected 
i Framing error detected 
1 Parity error detected 
al Overrun error detected 


03 Read COM port status 


Additional Call Registers Result Registers 
DX = port number (0 if COM1, AH = port status 
1 if COM2, etc.) AL = modem status 


Note: The port status and modem status returned in AH and AL are the same for- 
mat as INT 14H function 00H, described above. 


04 Extended initialize COM port 


Additional Call Registers Result Registers 
AL = 00H (break), AH = port status (see function AH = 0) 
01H (no break) AL = modem status (see function AH = 0) 


DX = port number (0 if COM1, 
1 if COM2, etc.) 
BH = parity = 00H none 
= 01H odd 
= 02H even 


eee 
766 


= 03H stick parity odd 
= 04H stick parity even 
BL = stop bits = 00H (one stop bit) 
= 01H (1.5 bits for 5-bit word) 
= 01H (2 bits for > 5-bit word) 
CH = word length = 00H 5-bit 
=01H 6-bit 
=02H 7-bit 
=03H 8-bit 
CL = baud rate = 00H 110 baud 
= 01H 150 baud 
= 02H 300 baud 
= 03H 600 baud 
= 04H 1200 baud 
= 05H 2400 baud 
= 06H 4800 baud 
= 07H 9600 baud 
= 08H 19200 baud 


05 Extended COM port control 


Additional Call Registers Result Registers 
AL = 00H (read control register), If read subfunction, 
= 01H (write to control register) BL = modem control register 
DX = port number (0 if COM1, 1 if COM2, etc.) If write subfunction, 
AL = modem status 
BL = Modem control register (see Figure 9-16) 
(see Figure 9-14) AH = line status 
bits 7-5: reserved (see Figure 9-15) 
bit 4: loop 
bit 3: out2 
bit 2: outl 
bit 1: RTS 
bit 0: DTR 


Note: Subfunction AL = 00H returns the modem control register contents in BL. 
Subfunction AL = 01H writes the contents of BL into the modem control register and 
returns modem and line status register contents in AL and AH. 


SECTION D.6: INT 16H -- KEYBOARD 


AH Function 


00H Keyboard read 


Additional Cail Registers Result Registers 
None AH = key scan code 


AL = ASCII char 
Note: Reads one character from the keyboard buffer and updates the head 
pointer. 


ene eee ree reer eee 
APPENDIX D: INTERRUPT CALLS AND LEGACY SOFTWARE 767 


H Function 


01H Get keyboard status 


Additional Call Registers Result Registers 
None If no key waiting, 
ZF = 1. 

If key waiting, 

ZF = 0, 


AH = key scan code, 
AL = ASCII char. 


Note: If a key is waiting, the scan code and character are returned in AH and AL, 
but the head pointer of the keyboard buffer is not updated. 


02H Get shift status 


Additional Call Registers Result Registers 


None AL = status byte 
bit 7: Insert pressed 

bit 6: Caps Lock pressed 

bit 5: Num Lock pressed 

bit 4: Scroll Lock pressed 

bit 3: Alt pressed 

bit 2: Ctrl pressed 

bit 1: Left Shift pressed 

bit 0: Right Shift pressed 


Note: The keyboard status byte returned in AL indicates whether certain keys 
have been pressed. If the bit = 1, the key has been pressed. 
03H Set typematic rate 


Additional Call Registers Result Registers 


AL= USE None 
BH = repeat delay (see below) 
BL = repeat rate (see below) 


Note: Sets the rate at which repeated keystrokes are accepted. 
The delay value in BH can be 00H (for 250), 01H (for 500), 02H (for 750), or 03H 


(for 1000). All values are in milliseconds. The repeat rate in BL represents the number 
of characters per second. Options are: 


Qe SO O OBH: 10.9 Gist A3 
Oise Bos 7 OCs TOTO LIR: 20) 
DA a 2a 0 ODE 92 Leo Sat 
Ose 2a ot OBH: 8.6 Us Sh 43} 
Oss 20) 56 Oasis 8.0 WARS 3,0 
Sie Iie 5 5 IOs 7.5 Weke 27 
OMS Lyi WAS G5 7 ICES 2.5) 
Oss Mle 40 LAS Goi Diss 253} 
Ogee 15.0 IL Sue eS aS) ae 2A 
OSR 13.3 Lays rO ese 2.0) 
OPARE 1250 i Sales 4G 20H to FFH - reserved 


768 


AH Function 


10H Extended keyboard read 


Additional Call Registers Result Registers 
None AH = key scan code 


AL = ASCII char 
Note: Used in place of INT 16H function 00H to allow program to detect F11, 


F12, and other keys of the extended keyboard. After the read, the head pointer of the key- 
board buffer is updated. 


11H Extended keyboard status 


Additional Call Registers Result Registers 
None If no key waiting, 
If key waiting, 

Z = 


AH = key scan code, 
AL = ASCII char. 


Note: This function is used instead of INT 16H function 01H so that programs can 
detect keys of the extended keyboard such as F11 and F12. If a key is waiting, the scan 
code and character are returned in AH and AL, but the head pointer of the keyboard buffer 
is not updated. 


12H Extended shift status 


Additional Call Registers Result Registers 
one AL = shift status 
bit 7: Insert locked 

bit 6: Caps Lock locked 

bit 5: Num Lock locked 

bit 4: Scroll Lock locked 

bit 3: Alt pressed 

bit 2: Ctrl pressed 

bit 1: Left Shift pressed 

bit 0: Right Shift pressed 


AH = extended shift status 
bit 7: SysRq pressed 

bit 6: Caps Lock pressed 
bit 5: Num Lock pressed 
bit 4: Scroll Lock pressed 
bit 3: Right Alt pressed 

bit 2: Right Ctrl pressed 
bit 1: Left Alt pressed 

bit 0: Left Ctrl pressed 


Note: The keyboard status bytes returned in AL and AH indicate whether certain 
keys have been pressed. If the bit = 1, the key has been pressed. 


Henne eae aaa aasascasaasaaa, 


APPENDIX D: INTERRUPT CALLS AND LEGACY SOFTWARE 769 


SECTION D.7: INT 1AH 


H Function 


00H Read system-timer time counter 


Additional Call Registers Result Registers 


None CX = high portion of count 
DX = low portion of count 

AL = 0 if 24 hours have not passed since last read 

> 0 if 24 hours have passed since last read 


Note: This function returns the number of ticks since midnight. A second is about 
18.2 ticks. When the number of ticks indicates that 24 hours have passed, AL is increment; 
ed and the tick count is reset to zero. Calling this function resets AL so that whether 24 
hours have passed can only be determined once a day. 


01H Set system-timer time counter 


Additional Call Registers Result Registers 


CX = high portion of tick count None 
DX = low portion of tick count 


Note: Calling this function will cause the timer overflow flag to be reset. 


02H Read rea!-time clock time 


Additional Call Registers Result Registers 


None CH = hours 
CL = minutes 
DH = seconds 


DL = 01 for daylight savings option 
= 00 for no option 
CF = 0 if clock operating, otherwise = 1 


Note: Hours, minutes, and seconds are returned in BCD format. This function is 
used to get the time in the CMOS time/date chip. 


03H Set real-time clock time 


Additional Call Registers Result Registers 


CH = hours None 
CL = minutes 
DH = seconds 


DL = 01 for daylight savings option 
= 00 for no option 


__ Note: Hours, minutes, and seconds are in BCD format. This function is used to set 
the time in the CMOS time/date chip. 


770 


H Function 


04H Read real-time clock date 


Additional Call Registers Result Registers 
None CH = century (19 or 20) 
CL = year 

DH = month 

DL = day 


CF = 0 if clock operating, otherwise = | 


Note: Century, year, month, and day are in BCD format. This function is used to 
get the date in the CMOS time/date chip. 


05H Set real-time clock date 


Additional Call Registers Result Registers 
CH = century (19 or 20) None 
CL = year 

DH = month 

DL = day 


Note: Century, year, month, and day are in BCD format. This function is used to 
set the date in the CMOS time/date chip. 


ee eee eee ee 


APPENDIX D: INTERRUPT CALLS AND LEGACY SOFTWARE 771 


R 


APPENDIX E 


I/O ADDRESS MAPS 


020—-03F 
040—-05F 
060—06F 
070—07F 
080—09F 
0A0-0BF 
0C0-0DF 
OFO 

OF1 
OF8—OFF 
1FO-1F8 
200-207 
20C-20D 
20E 

2E 2E 
2B0-2DF 
2E 


2E2 ee 2E3 


2F8—2FF 
300-3 1F 
360-363 
364-367 
368-36B 
36C-36F 
SEES 
380-38F 
3905393 
3A0-3AF 
3B0-3BF 
SCO OE 
3D0-3DF 
SEO=3E 7 
JRS EF 


6E2 & 6E3 


O05793 


SECTION E.1: ORIGINAL 80286 IBM PC I/O ADDRESS MAP 


Hex Range Device 
000-01 F DMA controller 1, 82374-5 


Interrupt controller 1, 8259A, Master 
Timer, 8254-2 

8042 (keyboard) 

Real-time clock, NMI (nonmaskable interrupt) mask 
DMA page register, 74LS612 
Interrupt controller 2, 8237A-5 
DMA controller 2, 8237A-5 

Clear math coprocessor busy 

Reset math coprocessor 

Math coprocessor 

Fixed disk 

Game I/O 

Reserved 

Reserved 

Parallel printer port 2 

Alternate enhanced graphics adapter 
GPIB (adapter 0) 

Data acquisition (adapter 0) 

Serial port 2 

Prototype card 

PC network (low address) 

Reserved 

PC network (high address) 
Reserved 

Parallel printer port 1 

SDLC, bisynchronous 2 

Cluster 

Bisynchronous | 

Monochrome display and printer adapter 
Enhanced graphics adapter 
Color/graphics monitor adapter 
Disk controller 

Serial port | 

Data acquisition (adapter 1) 

Cluster (adapter 1) 


ĖS 


773 


AE2 & AE3 
B90-B93 
BE? IBES 
1390-1393 
2P 
239072393 
42FE1 

62E1 

82E1 

A2E] 
C2E1 

E2E1 


Data acquisition (adapter 2) 
Cluster (adapter 2) 

Data acquisition (adapter 3) 
Cluster (adapter 3) 

GPIB (adapter 1) 

Cluster (adapter 4) 

GPIB (adapter 2) 

GPIB (adapter 3) 

GPIB (adapter 4) 

GPIB (adapter 5) 

GPIB (adapter 6) 

GPIB (adapter 7) 


SECTION E.2: Dell x86 PC I/O ADDRESS MAP 


Use the System Information utility in Windows to get the I/O address map for 


your x86 PC. > 
Hex Range Device 


0x00000000—0x00000CF7 
0x00000000—0x00000CF7 
0x00000010-0x0000001F 
0x00000020—0x00000021 
0x00000024—0x00000025 
0x00000028—0x00000029 
0x0000002C—0x0000002D 
0x0000002E—0x0000002F 
0x00000030-0x0000003 1 
0x00000034—0x00000035 
0x00000038-0x00000039 
0x0000003C—0x0000003D 
0x00000040-0x00000043 
0x0000004E—0x0000004F 
0x00000050-0x00000053 
0x00000060—0x00000060 
0x0000006 1—0x00000061 
0x00000062—0x00000062 
0x00000063—-0x00000063 
0x00000064—0x00000064 
0x00000065—0x00000065 
0x00000066—0x00000066 
0x00000067—0x00000067 
0x00000070—0x00000071 
0x00000072—0x00000077 
0x00000080—0x00000085 
0x00000086—0x00000086 
0x00000087—0x0000008F 
0x00000090-0x00000091 
0x00000092—0x00000092 
0x00000093—0x0000009F 
0x000000A0-0x000000A 1 
0x000000A4—0x000000A5 
0x000000A8—-0x000000A9 


PCI bus i 
Direct memory access controller 
Direct memory access controller 
System board 

Programmable interrupt controller 
Programmable interrupt controller 
Programmable interrupt controller 
System board 

Programmable interrupt controller 
Programmable interrupt controller 
Programmable interrupt controller 
Programmable interrupt controller 
System timer 

System board 

System timer 

Standard 101/102-Key PS/2 Keyboard 
System speaker 

Standard 101/102-Key PS/2 Keyboard 
System speaker 

Standard 101/102-Key PS/2 Keyboard 
System speaker 

Standard 101/102-Key PS/2 Keyboard 
System speaker 

System CMOS/real-time clock 
System CMOS/real-time clock 

Direct memory access controller 
System board 

Direct memory access controller 
Direct memory access controller 
System board 

Direct memory access controller 
System board 

Programmable interrupt controller 
Programmable interrupt controller 


eee 


774 


0x000000AC—0x000000AD Programmable interrupt controller 

0x000000B0—0x000000B 1 Programmable interrupt controller 

0x000000B2—0x000000B2 System board 

0x000000B3--0x000000B3 System board 

0x000000B4—0x000000B5 Programmable interrupt controller 

0x000000B8—0x000000B9 Programmable interrupt controller 

0x000000BC-—0x000000BD Programmable interrupt controller 

0x000000C0—0x000000DF Direct memory access controller 

0x000000FO—0x000000FF Numeric data processor 

0x00000170-—0x00000177 Secondary IDE Channel 

0x000001FO—0x000001F7 Primary IDE Channel 

0x00000274—0x00000277 ISAPNP Read Data Port 

0x00000279-0x00000279 ISAPNP Read Data Port 

0x00000376—0x00000376 Secondary IDE Channel 

0x000003B0—0x000003BB Mobile Intel(R) 955XM/945GM/PM/GMS/940GML 
Express PCI Express Root Port - 27A1 

0x000003BO—0x000003BB NVIDIA Quadro FX 1500M 

0x000003C0—0x000003DF Mobile Intel(R) 955XM/945GM/PM/GMS/940GML 
Express PCI Express Root Port - 27A1 

0x000003C0—0x000003DF NVIDIA Quadro FX 1500M 

0x000003F6—0x000003F6 Primary IDE Channel 

0x000004D0—-0x000004D 1 System board 

0x00000809-0x00000809 System board 

0x000009 10—0x0000091F System board 

0x00000920-—0x0000092F System board 

0x00000930—0x0000097F System board 

0x00000A79—0x00000A79 ISAPNP Read Data Port 

0x00000C80—0x00000CAF System board 

0x00000CB0—0x00000CBB System board 

0x00000CBC—O0x00000CBF System board 

0x00000CCO—0x00000CFF System board 

0x00000D00—0x0000FFFF PCI bus 

0x00001000—0x00001005 System board 

0x00001006—0x00001007 System board 

0x00001008—0x0000100F System board 

0x0000100A—0x00001059 System board 

0x00001010—0x0000102F System board 

0x00001060—0x0000107F System board 

0x00001080—0x000010BF System board 

0x000010C0—0x000010DF Intel(R) 82801G (ICH7 Family) SMBus Controller - 
27DA 

0x000010C0—0x000010DF System board 

0x0000BF20—-0x0000BF3F Intel(R) 82801G (ICH7 Family) USB Universal Host 
Controller - 27CB 

0x0000BF40—0x0000BFSF Intel(R) 82801G (ICH7 Family) USB Universal Host 
Controller - 27CA 

0x0000BF60—0x0000BF7F Intel(R) 82801G (ICH7 Family) USB Universal Host 
Controller - 27C9 

0x0000BF80—0x0000BF9F Intel(R) 82801G (ICH7 Family) USB Universal Host 
Controller - 27C8 

0x0000BFA0-Ox0000BFAF Intel(R) 82801GBM/GHM (ICH7-M Family) Serial 


a  ———————————————————————— 


APPENDIX E: I/O ADDRESS MAPS MS 


0x0000D000—-0x0000DFFF 
0x0000E000-—0x0000EFFF 


0x0000EF00—0x0000EF7F 
0x0000F400—-0x0000F4FE 


776 


ATA Storage Controller - 27C4 

Intel(R) 82801G (ICH7 Family) PCI Express Root Port 
- 27D6 

Mobile Intel(R) 955XM/945GM/PM/GMS/940GML 
Express PCI Express Root Port - 27A1 

NVIDIA Quadro FX 1500M 

System board 


APPENDIX F 


ASCH CODES 


DANS 


8 
6 
y 
. 
2 
+ 
z 
Le) 
o 
é 
? 
F 
n 
Eed 
> 
4 
z 


w 8 
= O ONNAN A wN 


t + e +l 


Il A^ w 
>? = SaeaeNnNKKECoSCHHADHEOVSOZAYPRo BM THO DMS! VF ows 


a 
b 
c 
d 
e 
f 
=f 
h 
i 
j 
k 
1 
m 
n 
o 
p 
q 
r 
s 
t 
u 
v 
W 
x 
y 
z 
<£ 

i 
> 


Ae t r 
mY 
D 


y 


e (0 [ID m 3 Sy Oy eA oy 
es + |e p r 


bs 
tal 


— 
= 


E Ik S= Il 


a 
G 
r 
H 
E 
5 
yp 
1 
a 
r ii B 
í 
6 
o 
5 
€ 
n 
+ 
2 
£ 
cr 
J 


cee i 


Ld =3a na eS l 
li 


Ç 
u 
e 
a 
a 
a 
a 
5 
Ē 
ë 
È 
i 
i 
i 
A 
Å 
É 
e 
FE 
6 
Öö | 
0 
a 
ù 
Y 
© 
ü 
t 
£ 
¥ 
Pts 


Bae “5 E 


Sin} 


778 


INDEX 


16-bit architecture, 

32-bit architecture, 
addressing modes, 
registers, 

64-bit architecture, 
external memory space, 

74LS138 decoder, 

74LS244 octal buffer, 

74LS245 bidirectional buffer, 


74LS280 parity generator/checker, 


74LS373 D latch, 

80286 microprocessor, 
address bus, 
data bus, 

80386 microprocessor, 
bus cycle definition, 
descriptor tables, 
evolution from 80186, 
general registers as pointers, 
major changes, 
new instructions, 
paging, 
pins, 
protected mode, 
real mode, 
registers, 
scaled index addressing mode, 
virtual 8086 mode, 
virtual memory, 

80486 microprocessor, 
enhancements, 
pipelining, 

8088 microprocessor, 
address bus, 
bus timing, 
control bus, 
data bus, 
pins, 

8237 (direct memory access), 
all mask register, 
clear mask register, 
command register, 


246-250, 278-282 


218-225 
219 

29 

oy 

632 

2671, 297 
292, 294 
304 

276 

292 


242-246, 532 


243 
243 
530-553 
541 
549 
530 
534 
532 
S3 
550 
599 
545 
593 
SDD 
534 
552 
545 
590-595 
590 
594 
228-233 
228 
230 
230 
228 
231-233 
404-412 
411 
412 
407 


connection to CPU, 413, 415 
control registers, 407 
master clear register, 412 
mode register, 409 
pins, 413, 415 
single mask register, 411 
status register, 408 
8253/54 timer 
connection, 354 
control word, 352 
counter 0, 355 
counter 1, 356 
counter 2, 356 
initialization, 351 
programming, 358 
8255 programmable peripheral interface, 
299-310 
control word, 301 
mode selection, 300 
pins, 300 
8259 programmable interrupt controller, 
377-387 
from hardware, 393 
from microprocessor, 393 
initialization, 388 
interfacing to x86, 387 
priority, 394 
8284 clock generator, 236-237 
8288 bus controller, 233-236 
8-bit architecture, 238-242 
A 
ADC (analog-to-digital converter), 336-345 
ADC0848 chip, 337 
selecting an input channel, 338 
ADC808/809 chip, 342 
selecting an input channel, 344 
programming, 344 
address bus, 15 


address decoding, 265-269, 292-294 
addressing modes. 

See also 32-bit addressing modes 

based indexed addressing, 49 

based relative addressing, 48 

direct addressing, 47 

immediate addressing, 47 


en EEE 


INDEX 


779 


indexed relative addressing, 
register addressing, 
register indirect addressing, 
segment overrides, 
ASCII, 
to BCD conversion, 
to binary (hex) conversion, 
ASCII code, 
ASCII codes, 
Assembly instructions, 
ADD and ADC (addition), 
AND (logical), 
CALL (call procedure), 
CBW (convert byte to word), 
CLD (clear direction flag), 
CMP (compare), 


48 

46 

48 

49 

112 
112-117 
206, 209 
8 
777-778 
716-738 
92-95 
102 

369 

180 

187 


105-109, 184 


CMPSx (compare string Byte/Word), 189 
CWD (convert word to double word), 180 
DAA (decimal adjust for addition), 115-117 
DAS (decimal adjust for subtraction), 


117-118 
DIV (division), 99-101 
IDIV (anteger division), 181 
IMUL (integer multiplication), 183 
INT (interrupt), 369 
INTO (interrupt on overflow), 373 
IRET (interrupt return), 369 
LODSx (load string Byte/Word), 188 
MOVSx (move string Byte/Word), 187 
MUL (multiplication), 98-99 
OR (logical), 102 
RCL (rotate thru carry left), 120-121 
RCR (rotate thru carry right), 119-120 
REP prefix (repeat), 187 
REPE prefix. 

See REPZ prefix 

REPNE prefix. 
See REPNZ prefix 
REPNZ prefix (repeat not zero), 189 
REPZ prefix (repeat zero), 189 
RETF (return FAR), 369 
ROL (rotate left), 119-120 
ROR (rotate right), 118 


SAL (shift arithmetic left). 


See SHL (shift left) 


SAR (shift arithmetic right), 


SCASx (scan string Byte/Word), 


184 
19i 


SHL (shift left logical), 105 
SHR (shift right logical), 104 
STD (set the direction flag), 187 
STOSx (store string Byte/Word), 188 
SUB and SBB (subtraction), 96-97 
XLAT (translate), 192 
XOR (logical), 103 
Assembly language 
ADD instruction, 32 
impact on flag register, 44 
assembler, 60 
code segment, 58, 79 
comments, 56 
control transfer, 68 
CALL, 70 
FAR and NEAR, 68 
jumps, 68, 69, 70 
data segment, SI 
directive 
EVEN, 281 
directives, 56, 740-750 
.386 (386 processor and above), 220 
.86 (any x86 processor), 220 
DB (define byte), 73 
DD (define doubleword), JS 
DQ (define quadword), 75 
DT (define ten bytes), 76 
DUP (duplicate), 74 
DW (define word), 74 
END (end), NO7 
EQU (equate), iS 
EXTRN (external), 196 
ORG (origin), 73 
PAGE and TITLE, 61 
PUBLIC (public), 196 
SEGMENT (segment), 203 
fields, 56 
files, 60, 62 
labels, 56, 71, 730 
linker, 62 
MACRO definition, 147-157 
mnemonics, 56 
model definition, 56 
MOV instruction, 30 
operands, 56 
programming, 30 
reserved names, 751 


camma 


780 


sample programs, 
segment definition, 


63, 64, 65, 66, 67 
somo], age 8, TS 


stack segment, 57, 68, 77 
subroutines, {Al 
B 
BCD number system, gl 
addition, 114-117 
subtraction, 117-118 
to ASCII conversion, 114-117 
to ASCII conversion in C, 122-123 
binary logic, 9 
binary to ASCII conversion, 205, 207 
BIOS ROM, 39 
INT 10H (screen control), 130-137 
INT 16H (keyboard), 162-165 
bus boosting, 241 
Cc 
C language, 121 
bit testing, 124 
bitwise operators, 121-123 
in-line Assembly, 224-225 
C/C++ programming, 306-310 
Linux, 308 
CGA (color graphics adapter), 440 
code segment, 33, 203 
address, 34 
CodeView, 220 
COM files, 83 
computers 
memory map, 38 
organization, 14 
terminology, 13 
CPU (central processing unit) 
how it works, 17 
inside, 16 
relation to RAM and ROM, 15 
CPU identification, 615 


D 


DAC (digital-to-analog converter), 332-335 
DAC808, 382 
generating a sine wave, 332 


MC1408 DAC, 332 
data bus, 14 
data integrity, 273-277 

checksum byte, 273 

checksum program, 23 

error detection, 274 

parity bit, 275 
data segment, 35, 203 

address, 36 
DEBUG 

A, the assemble command, 702 

D, the dump command, 707 

E, the enter command, 708 

entering and exiting, 700 

examining/altering registers, 700 

examining/altering the flag register, 710 

F, the fill command, 706 

G, the go command, 703 

Q, the quit command, 700 

T, the trace command, 705 

U, the unassemble command, 702 
decimal to binary (hex) conversion, 206 
direct memory access (DMA), 402-420 

16-bit channels, 419 

8237 DMA. 

See 8237 DMA 

channel 1, 417 

channel 2, 419 

channel assignment, 416 

channel priority, 419 

concept, 402 

initialization, 404 
DMA (direct memory access), 240 
DOS INT 21H (I/O), 137-147 
DSP (digital signal processing). 

See MMX 
E 
EGA (enhanced graphics adapter), 441 
emu8086 assembler, 80-82 
EXE files, 83 
F 
flowcharts, 84-87, 108, 210 


full segment definition, 200-203 


nee EEE EERE 


INDEX 


781 


H soft error, 654 


IEEE floating-point standard, 504-505 
hard disks, 492-499 format, 506 
boot record, 495 input/output (INT 21H), 137-147 
bootable and nonbootable disks, 495 interrupts 
capacity, 492 categories, 370 
clusters, 496 handling, 371 
disk caching, 498 INT 00 (divide error), 371 
disk reliability, 499 INT 01 (single step), 372 
ESDI (enhanced small device interface), INT 02 (nonmaskable interrupt), 372 
496 INT 03 (breakpoint), 373 
FAT (file allocation table), 494-495 INT 04 (signed number overflow), 373 
formatting, 493 INT 10H (input/output), 759-764 
IDE (integrated device electronics), 497 INT 12H (memory size), 765 
organization, 492 INT 14H (COM port), 765-767 
partitioning, 495 INT 16H (keyboard), 767-769 
SCSI (small computer system interface), INT 1AH (timer/time), ` FIIO- T 
497 INT 21H (operating system), 754-756 
speed, 496 INT 33H (mouse), 756-759 
hex to decimal conversion, 205 interrupt service routine (ISR), 368 
high memory area (HMA), 250 sources, 390 
HyperTerminal, 455 vector table, 368, 375 
ISA bus, 238-242 
l 8-bit and 16-bit I/O, 668 
address bus, 239, 247 
I/O address maps address bus signals, 660 
80286, 713 bus history, 238 
System Information, 774 control bus, 240, 247 
IC technology, 638-655 data bus, 239, 247 
advances, 642 limitations, 675 
capacitance derating, 646 local bus, 238 
crosstalk, 652 memory control signals, 660-662 
decoupling capacitors, 651 PC104 bus and embedded PC, 676 
evolution, 643 peripherals, 673 
fan-out, 644 system bus, 238 
FIT (failure in time), 653 timing, 662-664, 671-672 
gallium arsenide (GaAs) chips, 655 ISA expansion slot, 248 
ground bounce, 650 
hard error, 654 K 
history, 641 
input/output characteristics, 640 keyboard (INT 16H), 162-165 
inverters, 639 keyboards 
line ringing, 653 BIOS INT 16H, 472 
logic families, 639 buffer, 476 
mean time between failures (MTBF), 654 ECP (extended capability port), 486 
MOS vs. bipolar transistors, 638 EPP (enhanced parallel port), 486 
power dissipation, 648 first keyboard status byte, 470 


—_—---OO—KKae— ii 


782 


head pointer, 476 
operation, 464-468, 469, 474 
overrun, 475 
printing a character, 483 
programming, 468-478 
scan codes, 469 
second keyboard status byte, 471 
SPP (standard parrallel port), 484 
tail pointer, 476 
L 
LCD (liquid crystal display), 316-322 
busy flag, 320 
commands, 307 
cursor position, 321 
pin descriptions, 316 
programming, 322 
timing, a2 
little endian convention, 36, 221 
local bus, 678 
logic design, 10 
logic gates, 9 
AND gate, 9 
inverter, 10 
NAND and NOR gates, 10 
OR gate, 9 
tri-state buffer, 9 
XOR gate, 10 
logical and physical address, 33, 42 
in the code segment, 34 
in the data segment, 36 
in the stack segment, 4] 
overlapping, 42 
look-up tables, 192 
looping 
using the zero flag, 46 
M 
MASM, 220 
MAX232/3 (EIA-232 driver/receiver), 451 
memory, 256-265 
access time; 562 
bandwidth, 282 
banks, 215,218 
burst mode, 581 


cache, 570 
coherency, Sad 
direct-mapped, DA 
fill block size, Sal 
fully associative, ol 
levels, 578 
organization, Sl 
replacement policy, a1] 
set associative, 573 
updating main memory, 576 

capacity, 256 

cycle time, 560 

DDR (double data rate) RAM, 581 

DIMM and SIMM memory modules, 665 

DRAM 
interleaving, 565 
page mode, 566 
standard mode, 562 
static column mode, 567 
types, 562 

DRAM (dynamic RAM), 262 

EDO (extended data-out) DRAM, 578 


EEPROM (electrically erasable PROM), 


258 

EPROM (erasable PROM), 258 
Flash (Flash EEPROM), 259 
mask ROM, 260 
memory map, 666 
NV-RAM (nonvolatile RAM), 265 
organization, 256 
PROM (programmable ROM), 29V 
RAM (random access memory), 260 
Rambus DRAM, 581 
ROM (read-only memory), 257 
ROM duplicate, 666 
SDRAM (synchronous DRAM), hie 
shadow RAM, 667 
speed, 256, 279 
SRAM (static RAM), 260 
wait states, 560 
memory map, 269-272 
BIOS data area, 270 
conventional memory, 270 
reset address, BII 
video display RAM, 270 
MMX (multimedia extension), 613-618 
CPUID instruction, 616 


EE 


INDEX 


783 


data types, 


DSP (digital signal processing), 


register aliasing, 
modules, 

pass parameters, 
monitors, 

character box, 

color, 

cursor position, 

digital vs. analog, 

dot pitch, 

graphic mode, 

monitor size, 

phosphors, 

pixel programming, 

scrolling, 

text mode, 

video controller, 

video modes, 
Moore’s law, 
mouse (INT 33H), 
music generation, 

using C#, 


number systems, 
binary, 
addition, 


binary to and from hexadecimal, 


binary to decimal, 

counting in bases, 

decimal, 

decimal to binary, 

hexadecimal, 
addition, 
subtraction, 


hexadecimal to decimal, 


P 


PCI bus 
bus arbitration, 
bus protocol, 


connector (expansion slot), 


local bus, 
master and slave, 


615 
613 
614 
196-205 
211-213 
424-433 


428, 435, 438 


427 
433 
427 
426 
440-442 
426 
427 
442 
435 
436 
428 
429, 433 
635 
166-173 
359-364 
364-365 


N 
“JINE P YIS U E APAY 


performance, 683 
plug-and-play, 680 
Pentium II to Pentium IV microprocessor 
cache, 626 
hyper-threading technology (HTT), 628 
multicore technology, 629 
streaming SIMD extension (SSE), 630 
Pentium microprocessor, 596-602 
features, 597-602 
overdrive technology, 602 
Pentium Pro microprocessor, 609-613 
architecture, 610 
branch prediction, 612 
bus vs. internal frequency, l 613 
out-of-or¢d _- execution, 611 
superpipelined, 610 
superscalar, : 610 
pipelining, 2 
printers, 478-486 
addresses, 481 
BIOS INT 17H, 482 
Centronics printer interface, 478 
control characters, 482 
control signals, 480 
status signals, 478 
time-out, 481 
pseudocode, &4-87, 108, 210 
R 
RAM, 38 
video RAM, 39 
registers, 29 
Extra Segment (ES), 38 
flag register, 43 
Stack Pointer (SP), 41 
Stack Segment (SS), 41 
RISC architecture, 602-609 
comparison to CISC, 606 
features, 603-605 
Harvard and von Neumann, 605 
RS232. 
See serial communication 
S 
screen control (INT 10H), 130-137 


ees 


784 


serial communication, 


data communication classification, 


data framing, 

data transfer rate, 
full-duplex transmission, 
half-duplex transmission, 
handshaking, 


448-454 
452 
449 
450 
449 
449 
453 


MAX232/3 (EIA-232 driver/receiver), 451 


programming, 
signal conditioning, 
signed numbers (integer), 
overflow, 
speaker, 
stack segment, 
access, 
address, 
described, 
stepper motors, 
four-step sequence, 
programming, 
speed, 
step angle, 
timing, 
torque, 
wave drive sequence, 
strings, 
structured programming, 
SVGA (super VGA), 


Tt 


temperature sensors (LM34/35), 
thermistor, 
time delay, 


U 


USB (universal serial bus) 
architecture, 
cable signals, 
enumeration, 
features, 
hubs, 
programming, 


INDEX 


455-458 
342 
176-186 
178-179 
356 

39, 203 
40 

4] 

40 
326-332 
229 

327, 

330 

377) 

328 

330 

S51 
186-192 
83, 84 
442 


341 
341 
357-358 


689 
692 
693 
688 
690-692 
694 


V 


VGA (video graphics array), 
virtual memory, 

Visual C/C++ programming, 
Visual C++, 


X 


x86 
data types, 
I/O map, 
interrupts. 


x86 family 
80286, 
80386, 
80486, 
8088, 
evolution, 
Pentium, 
Pentium II, 
Pentium III, 
Pentium Pro, 

x87 math coprocessor, 
arithmetic instructions, 
constant instructions, 
control instructions, 
integer intructions, 
packed BCD instructions, 
programming, 
real number instructions, 
registers, 
transcendental instructions, 
trigonometric instructions, 


430, 441 
545 
322 
224-225 


73 
296 


See interrupts 


25 
25 

25 

25 

24 

26 

26 

26 

26 
504-523 
520-522 
523 

524 
517, 519 
520 

510 

519 

508 

522 

517 


785 


786 


ee: o 


mm: g 
= Sa A a, 


y 


The x66 PC am 


assembly language, design, and interfacing 
MUHAMMAD ALI MAZIDI - JANICE GILLISPIE MAZIDI + DANNY CAUSEY 


Previously published as The 80x86 IBM PC and Compatible Computers: 
Volumes | & Il: Assembly Language, Design, and Interfacing, this visually 
appealing text has been widely used and praised by experts for its clarity 
and topical breadth. It provides a step-by-step, systematic approach to 
teaching the fundamentals of x86 assembly language programming and PC 
architecture. It offers readers a fun, hands-on learning experience and 
reinforces concepts with numerous examples and review questions. The text 
delves into x86 architecture, buses, interfacing techniques, system 
programming, IEEE floating-point math, USB, cache, and RISC and Harvard 
architecture. It is an ideal reference for x86 embedded system designers. 


The fifth edition of this text: 

* Covers all the x86 microprocessors from the 8086 to the 64-bit Itanium 

e Uses Assembly and C programming examples to give a deeper understanding of the x86 PC 
architecture 

e Introduces the x86 instructions with examples of how they are used 

* Provides a basic understanding of IEEE floating-point numbers and math coprocessors 

e Discusses and analyzes hardware differences among 16-bit, 32-bit, and 64-bit processors such as 
Pentium and Itanium chips 

e Discusses 8-bit, 16-bit, and 32-bit interfacing of x86 microprocessors 

* Shows a real-world approach to PC system programming by using fragments of programs from the IBM 
PC technical reference 

e Provides an introduction to the USB port and how to access it using C# 

* Compares and contrasts the x86 CPUs with RISC processors 

e Examines the x86 cache memory and its organization 

e Covers the new 64-bit features of x86 processors from Intel and AMD 

e Discusses the superscalar architecture of x86 processors with their multicore features 


The x86 PC: Assembly Language, Design, and Interfacing is the latest volume in the series of textbooks by 
Mazidi and Causey. This series of texts is widely used around the world by both industry and academics and 
has been translated into many languages. The other titles in the series are: 


The 8051 Microcontroller and Embedded Systems (2nd ed.) 
The PIC Microcontroller and Embedded Systems 
The HCS12 Microcontroller and Embedded Systems 


Titles to come include: 
The AVR Microcontroller and Embedded Systems 
PIC16 Assembly Language and Interfacing 


ISBN-13: 978-0-13-502648-9 
ISBN-10:  0-13-502648-2 


Prentice Hall 
is an imprint of 


PEARSON 
š www.pearsonhighered.com 


l 


